Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing for multi-dimensional arrays #513

Open
shoyer opened this issue Dec 7, 2017 · 20 comments
Open

Typing for multi-dimensional arrays #513

shoyer opened this issue Dec 7, 2017 · 20 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@shoyer
Copy link

shoyer commented Dec 7, 2017

I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (numpy/numpy#7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).

To help guide discussion, I wrote a document outlining ideas for array shape typing.

To summarize:

  • We would like to be able to type-check both data types (e.g., float64) and shapes (e.g., a 3x4 array) for multi-dimensional arrays.
  • There are many uses cases where support for checks using dimension identity would be valuable, e.g., to indicate that a function transforms an array with shape (N, M) to shape (N,) for arbitrary integers N and M. These dimension variables look very similar to TypeVar, if TypeVar supported integers as types.
  • A notion of "zero or more additional dimensions" would also be quite valuable, and is a core part of the type for many NumPy operations (generalized ufuncs). This might be naturally written with Ellipsis, e.g., (...., N) for an array with a last dimension of length N and any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.

This will likely require some new typing features (as well as type-checker support). Notably:

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Dec 7, 2017

It looks like the proposal of integer generics is also relevant here python/mypy#3345 (it looks almost identical to what you call DimensionVar).

In general, I am very supportive of this project (I have heard many times that static typing would be very helpful for data science, numerics and related fields, but current support in mypy and PEP 484 is very limited). The main obstacle however is the size of this project (it may require its own PEP). I will read your document (thanks for writing it), but already now it seems to me that it may make sense to start from features that will be useful in general (i.e. also outside of numeric stack) such as literal types and variadic generics.

Also tagging @JukkaL here just in case.

@shoyer
Copy link
Author

shoyer commented Dec 9, 2017

The main obstacle however is the size of this project (it may require its own PEP).

Yes, I expect a PEP will be necessary, especially if we want to standardize base types for typing multi-dimensional arrays in the typing module.

it seems to me that it may make sense to start from features that will be useful in general (i.e. also outside of numeric stack) such as literal types and variadic generics.

Indeed, this is probably the best place where the broader typing community can help.

@shoyer
Copy link
Author

shoyer commented Dec 10, 2017

I've opened a sub-issue for discussing syntax for array typing: #516

@ilevkivskyi
Copy link
Member

Some update on the issue:

Our (mypy core team) previous schedule for working on this was Q4 2018. However, we decided that some type system features (such as literal types and variadic generics) needed to efficiently support NumPy will be also useful in general, so we decided to implement the general support for such features first. Literal types are almost already there, and variadic generics are going to be added in coming months. After that we will start working on dedicated NumPy support (around Q2), sorry for a delay.

@sndrtj sndrtj mentioned this issue Jan 28, 2019
@ilevkivskyi
Copy link
Member

Sorry, I forgot to post notes from the latest Python typing meetup on numeric stack typing here. Here they are

@vsiles
Copy link

vsiles commented May 7, 2019

Are you specifically looking at numpy, or at the machine learning echosystem with numpy/pytorch/... ?
I found today that they are quite heterogeneous:

>>> x = torch.zeros([4], dtype=torch.int8)
>>> y = torch.zeros([4], dtype=torch.float32)
>>> torch.add(x, y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected type torch.FloatTensor but got torch.CharTensor
>>> xx = numpy.array([4], dtype=numpy.int8)
>>> yy = numpy.array([4], dtype=numpy.float32)
>>> xx + yy
array([8.], dtype=float32)

Pytorch doesn't seems to do auto cast when types are different whereas Numpy is doing some upcast (see https://stackoverflow.com/questions/56022497/numpy-pytorch-dtype-conversion-compatibility/56022918?noredirect=1#comment98689941_56022918)

@ilevkivskyi
Copy link
Member

Are you specifically looking at numpy, or at the machine learning echosystem with numpy/pytorch/... ?

At all of them. Dimensionality/shape will be an additional abstraction orthogonal to container type and element type.

@vsiles
Copy link

vsiles commented May 7, 2019

Sorry I wasn't clear, I wanted to ask for the numerical stack part specifically. Do we have a current target in numpy / pytorch / tensorflow that would focus most of the effort are are people looking to their favorite flavor (which seems incompatible with each other)

@ilevkivskyi
Copy link
Member

Do we have a current target in numpy / pytorch / tensorflow that would focus most of the effort are are people looking to their favorite flavor (which seems incompatible with each other)

There are two separate big things required to support numerical libraries:

  • New type system features
  • Adding stubs for popular libraries

In the first one we ideally want to be as broad as possible, I think there are no particular "preferences". While in the second, I think we should probably start with numpy, since it is the common dernominator for many other libraries.

@dmontagu
Copy link

@ilevkivskyi do you have any suggestions for how to track progress on (or, even better, contribute to) the development of these "numeric stack typing" features? Full support for the features described in your linked notes on numeric stack typing would be incredibly useful!

@ilevkivskyi
Copy link
Member

@dmontagu The best way is to just follow this issue, also you can subscribe to [email protected] mailing list. There are no updates here because we didn't make much progress yet. Whether you can help depends on your background and how much time are you ready to spend on this. This is not a simple feature and it is hard to split in small "things".

@theodoretliu
Copy link

Hey! I'm a student working on a thesis and I am very interested in contributing to this project as part of my research! Mainly, I want to statically check dimensionality alignment in numpy operations. Let me know how I can help out.

@ilevkivskyi
Copy link
Member

@theodoretliu Hi! It is great to hear you are interested. Just to get a bit more info, how much time will you be able to spend on this?

The best course of action is probably to implement support for relevant type system features in one of the mainstream Python type checkers. I would of course propose mypy :-) as one of its maintainers, see https://github.com/python/mypy

If this sounds right to you, I can give you a more detailed plan and some guidance.

@theodoretliu
Copy link

I'd be willing to dedicate pretty significant time in the coming months. And yes, that sounds like a great course of action!

@vsiles
Copy link

vsiles commented Nov 13, 2019 via email

@vsiles
Copy link

vsiles commented Nov 13, 2019 via email

@mrahtz
Copy link

mrahtz commented Jun 12, 2020

A group of us at DeepMind are interested on working on this too. We've set up a mailing list at https://groups.google.com/g/python-shape-checkers to try and bring together all the conversations about this into one place. I've posted a summary there of what seems to be the current state of things, but stay tuned for updates!

@fylux
Copy link

fylux commented Jun 12, 2020

Hi @mrahtz,

Thanks for the initiative! Indeed there are currently a lot of ongoing efforts in this directions. At Facebook we are currently working directly on this, and already support several use cases with Pyre, with support for variadic syntax, which has been polished with respect to the initial proposal at Python Typing Summit. However, it would be very beneficial to get first hand information of the state of each team that is working on this, since so far I have read about people working on that in Dropbox, Facebook, Google and now Deepmind.

Also, please don't miss the Python Typing mailing list.

@redradist
Copy link

I'd like to open a discussion about typing for multi-dimensional arrays in general, and more specifically for NumPy. We have already been discussing this over in the NumPy issue tracker (numpy/numpy#7370) and recently opened a new repository to start writing type stubs (https://github.com/numpy/numpy_stubs).

To help guide discussion, I wrote a document outlining ideas for array shape typing.

To summarize:

* We would like to be able to type-check both data types (e.g., `float64`) and shapes (e.g., a 3x4 array) for multi-dimensional arrays.

* There are many uses cases where support for checks using dimension identity would be valuable, e.g., to indicate that a function transforms an array with shape `(N, M)` to shape `(N,)` for arbitrary integers `N` and `M`. These dimension variables look very similar to `TypeVar`, if `TypeVar` supported integers as types.

* A notion of "zero or more additional dimensions" would also be quite valuable, and is a core part of the type for many NumPy operations (generalized ufuncs). This might be naturally written with Ellipsis, e.g., `(...., N)` for an array with a last dimension of length `N` and any number of proceeding dimensions. There are particular rules (broadcasting) that should be enforced for matching multiple arguments with variable numbers of dimensions.

This will likely require some new typing features (as well as type-checker support). Notably:

* Support for literal values (#478), so we can type check operations like `array.sum(axis=0)`.

* Variadic generics (#193), we can write types like `NDArray[N]` and `NDArray[N, M]`.

* Some sort of support for dimension identity in shapes (e.g., integer types, or `DimensionVar` as described in my doc).

* Standard syntax for writing array dtype/shape annotations: what should these look like?

You wanted this annotation:

class float64: # Custom annotation class
    def __getitem__(self, item):
        # Some value should be set to identify that float64[:], float64[:,:] or etc.
        return self


float64 = float64()


def for_loop(n: float64[:,:]):
    pass

Take it ;)

@srittau srittau added the topic: feature Discussions about new features for Python's type annotations label Nov 4, 2021
@James4Ever0
Copy link

James4Ever0 commented Jul 7, 2023

To solve this issue, using "Annotated[]" would be efficient to declare the type already. However to get the proper type and "static" type checking on "Annotated[]" we need support on mypy/pyanalyze etc. To annotate and infer type with arithmetic from function calls like "np.reshape" we need to use code to define custom rules (not just PEP484) to analyze proper types. I doubt there are few supports on custom "Annotated[]" types, not easy for user to define and statically check their own "Annotated[]" types, which probably is the solution to all kinds of dynamic types in python, enabling symbolic execution of arbitrary python code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests

10 participants