Syntax for typing multi-dimensional arrays #516

shoyer · 2017-12-10T01:07:36Z

As part of the larger project for multi-dimensional arrays (#513), one of the first questions I would like to settle is what syntax for typing data-types and shapes should look like.

Both dtype and shape should be optional, and it should be possible to define multi-dimensional arrays for which either or both of these are generic:

dtype: indicates the data type for array elements, e.g., np.float64
shape: indicates the shape of the multi-dimensional array, a tuple of zero or more integers. We would like to support integer and variable sized dimensions, and variable numbers of dimensions. These are most naturally represented with indexing by a variadic number of integer, variable, colon : and/or ellipsis ... arguments, e.g., NDArray[1, N, :, ...] for an array with dimensions of size 1, size N, and arbitrary size, followed by 0 or more arbitrary sized dimensions.

For NumPy, ideally we would like to add basic typing support for dtype (using Generic) even before typing for shape is possible. But we'd like to know what the ultimate syntax should look like, so we don't paint ourselves into a corner.

One key question: can we safely rely on using a single generic argument for dtypes (e.g., np.ndarray[np.float64]) as indicating an array without any shape constraints?

My doc (same as in the master issue) considers a number of options under the "Possible syntax" section.

So far, I think the best option is some variation of "two generic arguments", for dtype and shape. But this could quickly get annoyingly verbose when sprinkled all over a code-base, e.g., np.ndarray[np.float32, Shaped[..., N, M]]:

It would be nice to support syntax like np.ndarray[np.float32] (the multi-dimensional equivalent of List[float]) as an alias for np.ndarray[np.float32, Any], but we don't yet have optional arguments for generics (variadic arguments are a somewhat awkward fit for a single argument).
It would also be nice to allow omitting Shaped[], e.g., by writing dimensions as variadic generics to the array type like np.ndarray[np.float32, ..., N, M]. One possible ambiguity is how to specify scalar arrays: np.ndarray[np.float32,] looks very similar to np.ndarray[np.float32]. But scalar arrays are rare enough that these could potentially be resolved by disallowing np.ndarray[np.float32,] in favor of requiring np.ndarray[np.float32, Shape[()]].

The text was updated successfully, but these errors were encountered:

ilevkivskyi · 2017-12-11T14:48:13Z

This is a hard question. I would prefer to have np.ndarray generic in two type variables: the first one for dtype, the second one for shape. Something like this (in stub file):

T = TypeVar('T')
S = TypeVar('S', bound=Shape)
class ndarray(Generic[T, S]): ...

where Shape would be a special variadic type very similar to Tuple but it will accept integers (both literals and constants) and "integer variables". So that it will look like:

a: ndarray  # just an array, shape and type are arbitrary
b: ndarray[float32, Any]  # array of floats with an unknown (dynamic) shape
c: ndarray[Any, Shape[100, 100]]  # array of dynamic types with fixed shape (100, 100)
c: ndarray[float32, Shape[100, 100]]
N = IntVar('N')
M = IntVar('M')
d: ndarray[float32, Shape[N, M]]

I understand that typing second Any in situations where one doesn't care about shape (or dtype) might be annoying, but I don't want to introduce additional exceptions for omitted type parameters. Second, I want to factor out the Shape type, so that it can be easily used by other libraries that use alternative array types (and maybe even built-in array).

shoyer · 2017-12-11T18:26:08Z

I would prefer to have np.ndarray generic in two type variables: the first one for dtype, the second one for shape.

Yes, this seems like the right way to do things.

I understand that typing second Any in situations where one doesn't care about shape (or dtype) might be annoying, but I don't want to introduce additional exceptions for omitted type parameters.

I will raise the issue of optional/default type variables separately. I agree that we shouldn't have a special case just for arrays.

On a related note: is there a good way to write "partially defined" generic type aliases? This would potentially alleviate the usability issue. For example, in user code:

FloatArray[...] as an alias for ndarray[float64, Shape[...]]
Matrix[...] as an alias for ndarray[..., Shape[N, M]]

I know I can specialize type variables in subclasses (e.g., class Matrix(ndarray[T, Shape[N, M]]), but that implies the argument is actually a member of the Matrix subclass. Likewise, I can write an alias Matrix = ndarray[T, Shape[N, M]], but that implies using the particular type variable T rather than producing a generic type with only one type variable.

Second, I want to factor out the Shape type, so that it can be easily used by other libraries that use alternative array types (and maybe even built-in array).

Yes, definitely!

ilevkivskyi · 2017-12-11T21:43:21Z

Likewise, I can write an alias Matrix = ndarray[T, Shape[N, M]], but that implies using the particular type variable T rather than producing a generic type with only one type variable.

If I understand you correctly, then I should say that situation with generic aliases is exactly opposite. For example:

T = TypeVar('T')
SDict = Dict[str, Tuple[T, T]]

d: SDict[int]  # same as Dict[str, Tuple[int, int]]

U = TypeVar('U')
def func(x: U, y: U) -> SDict[U]  # same as Dict[str, Tuple[U, U]]

careful: SDict = {}  # same as Dict[str, Tuple[Any, Any]]

(the last example is a typical pitfall, so that we have a special flag in mypy to catch this). There are some more examples in mypy docs (note they are still incomplete).

Taking this into account, I would expect at least the following alias defined in numpy (verbatim, but name is random) describing array with a given type but dynamic dimensions:

dynarray = ndarray[T, Any]

x: dynarray[float32]  # same as ndarray[float32, Any]

I will raise the issue of optional/default type variables separately. I agree that we shouldn't have a special case just for arrays.

FWIW this proposal (defaults for type variables) have appeared some time ago, but it didn't get enough support.

ilevkivskyi · 2017-12-11T21:47:46Z

(Also for the numpy stubs I know there was some prior proof-of-concept attempt, see https://github.com/machinalis/mypy-data)

shoyer · 2017-12-11T23:34:42Z

@ilevkivskyi Thanks for correcting my misconception about generics! This does make a significant difference for usability (probably good enough for me).

I suppose that whatever syntax is chosen for variadic type variables in #193 should also allow for aliases, so we can write something like:

S = TypeVar('S', variadic=True)
FloatArray = ndarray[float64, S]
x: FloatArray[N, M]

(Also for the numpy stubs I know there was some prior proof-of-concept attempt, see https://github.com/machinalis/mypy-data)

Yes, we know about this one. We definitely plan to recycle work if possible.

mitar · 2017-12-12T05:54:00Z

If keyword arguments to indexing would be allowed, we could have things like ndarray[int, shape=(1,2,3)]. :-)

junjihashimoto · 2018-01-16T01:43:56Z

Hi, @ilevkivskyi san and @shoyer san.
Thank you for great idea of ndarray with generic.

For supporting generic tensor and matrix operations like multiplying, flatten and reshape,
I think it is necessary to calculate IntVar.

I am trying IntVar and Shape with Generic on this code.
Current python can accept following syntax with modified typing.py.

def matmul(a: ndarray[T,Shape[N0,M0]],b: ndarray[T,Shape[N1,M1]]) -> ndarray[T,Shape[N0,M1]]:
    pass

def flatten(a: ndarray[T,Shape[N0,M0]]) -> ndarray[T,Shape[N0*M0]]:
    pass

def reshape(a: ndarray[T,Shape[N0*M0]]) -> ndarray[T,Shape[N0,M0]]:
    pass

junjihashimoto · 2018-01-16T06:34:33Z

I made a mistake.
There was no problem with multiplication.

This was referenced Dec 10, 2017

Typing for multi-dimensional arrays #513

Open

Type hinting / annotation (PEP 484) for ndarray, dtype, and ufunc numpy/numpy#7370

Closed

shoyer mentioned this issue Dec 12, 2017

Allow variadic generics #193

Closed

anntzer mentioned this issue Jan 11, 2018

Additional docstring recommendations matplotlib/matplotlib#10225

Closed

datnamer mentioned this issue Feb 23, 2018

Questions: Function composition and type hints xnd-project/libgumath#1

Closed

gdementen mentioned this issue Jan 8, 2020

(issue 832): implement CheckedSession, CheckedParameters and CheckedArray larray-project/larray#840

Merged

shoyer mentioned this issue Mar 27, 2020

RFC: TensorFlow Canonical Type System tensorflow/community#208

Merged

shoyer mentioned this issue Jun 9, 2020

Order of generic types for ndarray numpy/numpy#16547

Closed

anntzer mentioned this issue Feb 3, 2021

"sequence of float" vs "1D array-like" matplotlib/matplotlib#16161

Closed

srittau added the topic: feature Discussions about new features for Python's type annotations label Nov 4, 2021

bschnurr mentioned this issue Apr 8, 2022

[cv2] Completed type stubs for the following functions and their shared parameters: microsoft/python-type-stubs#112

Merged

junrushao mentioned this issue Jun 16, 2022

[TVMScript] TODO Items in Design junrushao/tvm#43

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax for typing multi-dimensional arrays #516

Syntax for typing multi-dimensional arrays #516

shoyer commented Dec 10, 2017

ilevkivskyi commented Dec 11, 2017

shoyer commented Dec 11, 2017

ilevkivskyi commented Dec 11, 2017

ilevkivskyi commented Dec 11, 2017

shoyer commented Dec 11, 2017

mitar commented Dec 12, 2017

junjihashimoto commented Jan 16, 2018

junjihashimoto commented Jan 16, 2018

Syntax for typing multi-dimensional arrays #516

Syntax for typing multi-dimensional arrays #516

Comments

shoyer commented Dec 10, 2017

ilevkivskyi commented Dec 11, 2017

shoyer commented Dec 11, 2017

ilevkivskyi commented Dec 11, 2017

ilevkivskyi commented Dec 11, 2017

shoyer commented Dec 11, 2017

mitar commented Dec 12, 2017

junjihashimoto commented Jan 16, 2018

junjihashimoto commented Jan 16, 2018