-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax for typing multi-dimensional arrays #516
Comments
This is a hard question. I would prefer to have T = TypeVar('T')
S = TypeVar('S', bound=Shape)
class ndarray(Generic[T, S]): ... where a: ndarray # just an array, shape and type are arbitrary
b: ndarray[float32, Any] # array of floats with an unknown (dynamic) shape
c: ndarray[Any, Shape[100, 100]] # array of dynamic types with fixed shape (100, 100)
c: ndarray[float32, Shape[100, 100]]
N = IntVar('N')
M = IntVar('M')
d: ndarray[float32, Shape[N, M]] I understand that typing second |
Yes, this seems like the right way to do things.
I will raise the issue of optional/default type variables separately. I agree that we shouldn't have a special case just for arrays. On a related note: is there a good way to write "partially defined" generic type aliases? This would potentially alleviate the usability issue. For example, in user code:
I know I can specialize type variables in subclasses (e.g.,
Yes, definitely! |
If I understand you correctly, then I should say that situation with generic aliases is exactly opposite. For example: T = TypeVar('T')
SDict = Dict[str, Tuple[T, T]]
d: SDict[int] # same as Dict[str, Tuple[int, int]]
U = TypeVar('U')
def func(x: U, y: U) -> SDict[U] # same as Dict[str, Tuple[U, U]]
careful: SDict = {} # same as Dict[str, Tuple[Any, Any]] (the last example is a typical pitfall, so that we have a special flag in mypy to catch this). There are some more examples in mypy docs (note they are still incomplete). Taking this into account, I would expect at least the following alias defined in dynarray = ndarray[T, Any]
x: dynarray[float32] # same as ndarray[float32, Any]
FWIW this proposal (defaults for type variables) have appeared some time ago, but it didn't get enough support. |
(Also for the |
@ilevkivskyi Thanks for correcting my misconception about generics! This does make a significant difference for usability (probably good enough for me). I suppose that whatever syntax is chosen for variadic type variables in #193 should also allow for aliases, so we can write something like: S = TypeVar('S', variadic=True)
FloatArray = ndarray[float64, S]
x: FloatArray[N, M]
Yes, we know about this one. We definitely plan to recycle work if possible. |
If keyword arguments to indexing would be allowed, we could have things like |
Hi, @ilevkivskyi san and @shoyer san. For supporting generic tensor and matrix operations like multiplying, flatten and reshape, I am trying IntVar and Shape with Generic on this code.
|
I made a mistake. |
As part of the larger project for multi-dimensional arrays (#513), one of the first questions I would like to settle is what syntax for typing data-types and shapes should look like.
Both
dtype
andshape
should be optional, and it should be possible to define multi-dimensional arrays for which either or both of these are generic:dtype
: indicates the data type for array elements, e.g.,np.float64
shape
: indicates the shape of the multi-dimensional array, a tuple of zero or more integers. We would like to support integer and variable sized dimensions, and variable numbers of dimensions. These are most naturally represented with indexing by a variadic number of integer, variable, colon:
and/or ellipsis...
arguments, e.g.,NDArray[1, N, :, ...]
for an array with dimensions of size 1, sizeN
, and arbitrary size, followed by 0 or more arbitrary sized dimensions.For NumPy, ideally we would like to add basic typing support for
dtype
(usingGeneric
) even before typing forshape
is possible. But we'd like to know what the ultimate syntax should look like, so we don't paint ourselves into a corner.One key question: can we safely rely on using a single generic argument for dtypes (e.g.,
np.ndarray[np.float64]
) as indicating an array without any shape constraints?My doc (same as in the master issue) considers a number of options under the "Possible syntax" section.
So far, I think the best option is some variation of "two generic arguments", for dtype and shape. But this could quickly get annoyingly verbose when sprinkled all over a code-base, e.g.,
np.ndarray[np.float32, Shaped[..., N, M]]
:np.ndarray[np.float32]
(the multi-dimensional equivalent ofList[float]
) as an alias fornp.ndarray[np.float32, Any]
, but we don't yet have optional arguments for generics (variadic arguments are a somewhat awkward fit for a single argument).Shaped[]
, e.g., by writing dimensions as variadic generics to the array type likenp.ndarray[np.float32, ..., N, M]
. One possible ambiguity is how to specify scalar arrays:np.ndarray[np.float32,]
looks very similar tonp.ndarray[np.float32]
. But scalar arrays are rare enough that these could potentially be resolved by disallowingnp.ndarray[np.float32,]
in favor of requiringnp.ndarray[np.float32, Shape[()]]
.The text was updated successfully, but these errors were encountered: