Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Programmatically create types #1371

Open
NeilGirdhar opened this issue Mar 17, 2023 · 18 comments
Open

Proposal: Programmatically create types #1371

NeilGirdhar opened this issue Mar 17, 2023 · 18 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@NeilGirdhar
Copy link

NeilGirdhar commented Mar 17, 2023

This proposal is aimed at solving two related problem. First, when defining a multi-ary operator on Numpy arrays, e.g., leaky integral, then you ideally want to bake in Numpy type promotion rules. However, even Numpy doesn't use its own type promotion rules in its type annotations.

So, my suggestion is the following:

from typing import SyntheticType

def ResultType(*args: Any) -> SyntheticType[DType]:
  return result_type(*args)  # Something like this, but would have to deal with e.g., numpy.ndarray[typing.Any, numpy.dtype[numpy.floating[typing.Any]]]

# Then...
    @overload
    def __add__(self: T, other: U) -> NDArray[ResultType[T, U]]: ...  # type: ignore[misc]

The type checker could be called with a special argument, like --create_synthetic_stubs. This would

  • Run type checking and collect a list of unique argument tuples to synthetic type functions like ResultType.
  • Start a Python interpreter and call the synthetic type functions using the the arguments (which are Python objects representing types, e.g., T=float and U=numpy.ndarray[typing.Any, numpy.dtype[numpy.floating[typing.Any]]]).
  • It would then write the results to some pyi file in some canonical table-like format, like:
    `ResultType: SyntheticTypeMapping = {(float, float): float, ...}

The table could either be just stored in the cache. Users of the library would have to generate this file, which means type checkers run code in the library.

The other problem this solves is one that I would like to annotate dataclasses whose elements can be None or int besides whatever they've specified as. See here for a description as to why. It would be pretty easy to code a Python transformation from a dataclass type to a new dataclass type with the transformed field types.

I realize this is pretty extreme, but the payoff would be commensurate.

@NeilGirdhar NeilGirdhar added the topic: feature Discussions about new features for Python's type annotations label Mar 17, 2023
@gvanrossum
Copy link
Member

I'm getting lost in all the higher order stuff :-(, but is the core of the idea here that you want to write Python code in the ResultType function that is evaluated by static type checkers? Could you give a more concrete example that shows a specific example that would use this? I have no idea what your code snippet above means. What is result_type? What would you actually do in it?

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Mar 20, 2023

the core of the idea here that you want to write Python code in the ResultType function that is evaluated by static type checkers?

Yes, exactly.

Could you give a more concrete example that shows a specific example that would use this? I have no idea what your code snippet above means. What is result_type? What would you actually do in it?

The result-type example would make the annotations for numpy arrays both simpler and more precise. The current numpy type annotations for __add__ require enumeration:

    @overload
    def __add__(self: NDArray[bool_], other: _ArrayLikeBool_co) -> NDArray[bool_]: ...  # type: ignore[misc]
    @overload
    def __add__(self: _ArrayUInt_co, other: _ArrayLikeUInt_co) -> NDArray[unsignedinteger[Any]]: ...  # type: ignore[misc]
    @overload
    def __add__(self: _ArrayInt_co, other: _ArrayLikeInt_co) -> NDArray[signedinteger[Any]]: ...  # type: ignore[misc]
    @overload
    def __add__(self: _ArrayFloat_co, other: _ArrayLikeFloat_co) -> NDArray[floating[Any]]: ...  # type: ignore[misc]
    @overload
    def __add__(self: _ArrayComplex_co, other: _ArrayLikeComplex_co) -> NDArray[complexfloating[Any, Any]]: ...

This is very labor-intensive. As a result, libraries like JAX have opted not to make their array type generic.

It's also not as precise as it could be since it only produces things like like NDArray[unsignedinteger[Any]]. We would ideally generate something more specific like NDArray[uint64]. (But that would have meant adding a lot more new type aliases.)

So that's why I'd like there to be a ResultType generic type that figures out the result type programmatically. Then, you could simply do

    def __add__(self: T, other: U) -> NDArray[ResultType[T, U]]: ...  # type: ignore[misc]

and it would cover all 5 of the above cases, and could even do the bit width of the types too. And it would do that all in one line of type annotations.

And that's just the beginning of what would be possible.

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Mar 20, 2023

Hmm depending on what code you allow this feels like more advanced version of this proposal. Type lookup map is in spirit a way to write a generic type like,

def result_type(t1, t2):
  if t1 == int and t2 == str:
    return Foo
  elif t1 == str:
    return Bar
  else:
    return Baz

Just that type lookup map case the rule is you are only allowed to define type in a way that dictionary mapping fixed number of types to one type can express. This allows writing your numpy example in type lookup table way as,

# Dict tuple of two types -> 1 type makes ResultType generic on type arguments.
ResultType = TypeLookup({
  (NDArray[bool_], _ArrayLikeBool_co): NDArray[bool_], 
  (_ArrayUInt_co, _ArrayLikeUInt_co): NDArray[unsignedinteger[Any]], 
  (_ArrayInt_co, _ArrayLikeInt_co): NDArray[signedinteger[Any]], 
  ...
})

# Actual usage would stay same in the end.
def __add__(self: T, other: U) -> NDArray[ResultType[T, U]]: ...

I'm mostly fond of type lookup way as type checker could implement it by converting ResultType with family of overloads. A type lookup is I think exactly equivalent to a concise way of writing a family of overloads as this example shows.

@gvanrossum
Copy link
Member

Implementing such a proposal would also be very labor-intensive.

@NeilGirdhar
Copy link
Author

Implementing such a proposal would also be very labor-intensive.

You're right. And there are many other way more important things that should be done first. We can close this if you like.

@frodo821
Copy link

from typing import SyntheticType

def ResultType(*args: Any) -> SyntheticType[DType]:
  return result_type(*args)  # Something like this, but would have to deal with e.g., numpy.ndarray[typing.Any, numpy.dtype[numpy.floating[typing.Any]]]

# Then...
    @overload
    def __add__(self: T, other: U) -> NDArray[ResultType[T, U]]: ...  # type: ignore[misc]

As an impression, I caught sight that the original proposal has some issues.

First, the proposal needs an extension to Python built-in type (or also grammar extension)because functions are not subscriptable. Second, with type checkers, those type-synthesiser functions must be free from side effects and infinite loops, but Python runtime cannot ensure a function has no side effects or infinite loops (perhaps the latter can achieve this by prohibiting some op-codes like JUMP_BACKWARD).

@NeilGirdhar
Copy link
Author

First, the proposal needs an extension to Python built-in type (or also grammar extension)because functions are not subscriptable.

Good point. I was trying to emulate the TypeGuard, but maybe it would be better to come up with a different syntax.

Second, with type checkers, those type-synthesiser functions must be free from side effects and infinite loops, but Python runtime cannot ensure a function has no side effects or infinite loops (perhaps the latter can achieve this by prohibiting some op-codes like JUMP_BACKWARD).

Or just don't worry about it? MyPy plugins today can already have infinite loops.

@frodo821
Copy link

@NeilGirdhar

MyPy plugins today can already have infinite loops.

From my understanding, this point makes no problem because it doesn't actually run the codes with type diagnostics, only analysing ASTs. But with this proposal, analysing abstract syntax trees might not make sense. Because type-checkers only can get concrete types by calling type-synthesiser functions; therefore, they cannot statically get concrete types.

@frodo821
Copy link

I also suppose type-checkers, e.g. Pyright, would not work correctly with this proposal.

@NeilGirdhar
Copy link
Author

From my understanding, this point makes no problem because it doesn't actually run the codes with type diagnostics, only analysing ASTs. But with this proposal, analysing abstract syntax trees might not make sense. Because type-checkers only can get concrete types by calling type-synthesiser functions; therefore, they cannot statically get concrete types.

I don't know what you mean?

I also suppose type-checkers, e.g. Pyright, would not work correctly with this proposal.

Why not? The whole point is for this to be universal.

@frodo821
Copy link

I don't know what you mean?

I mean, "statically" mentions that they don't actually execute a code but only walk an abstract syntax tree of the code.

Why not? The whole point is for this to be universal.

Because type-checkers like pyright or mypy do not run Python codes, but they only statically analyse abstract syntax trees to check types.

@NeilGirdhar
Copy link
Author

Because type-checkers like pyright or mypy do not run Python codes, but they only statically analyse abstract syntax trees to check types.

Well, MyPy has Python plugins (so it can run Python code). And any type checker could run code. Anyway, like Guido said, this is probably a lot of work. I'm happy to keep discussing, but it's unlikely that this would be done in any near future. I just proposed it because I got excited about it solving one of my problems.

@frodo821
Copy link

frodo821 commented Mar 20, 2023

Well, MyPy has Python plugins (so it can run Python code).

Oh, sorry I missed it.

@syastrov
Copy link

Could you not generate the stub files with the overloads with some templating language?

if the types depend on the environment somehow, then you’d have to do this outside of the published stubs.

There was a discussion of extending stubs via overlays here: https://mail.python.org/archives/list/[email protected]/thread/UOIVXOJA2PIEYF3XB37HBI2MAJ4XYNUI/

If we had that mechanism, then you could generate the extra overloads however you wanted (for example, by running Python code like you suggested). And then they would be respected by the type-checker.

@erictraut
Copy link
Collaborator

If I understand you correctly, your intent is to provide a way to programmatically synthesize ".pyi" files that are otherwise inconvenient to write by hand. If that understanding is correct, then I don't think there's a need to encumber the type system with any new complex features. The type system already prescribes how type stubs work, and type checkers do not care how such type stubs are generated. You can write whatever tool you want to generate them (e.g. a mypy plugin, if that's convenient). Directives to this tool could be provided in comments or some other templating mechanism. Since there's no need for this to be part of the type system, there's also no need for standardization (e.g. a PEP process). You can write whatever tool you find most useful to address your target use cases.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Mar 20, 2023

If we had that mechanism, then you could generate the extra overloads however you wanted (for example, by running Python code like you suggested). And then they would be respected by the type-checker.

You're right, and that's much simpler than what I proposed. Just from an idealism standpoint though, it's a bit less pretty to generate literally dozens of copies of __add__ instead of a single one with a magic ResultType[T, U] return value.

I guess if there were some way of creating a table type variable where ResultType would do a lookup in a table of types based on T and U, that would be really cool (something like what was described here). Then you could programmatically generate the table and the annotated functions would be really pretty.

If that understanding is correct, then I don't think there's a need to encumber the type system with any new complex features.

Yes, you're right.

Still would be very convenient to have a table lookup system or else

  • the number of generated overloads could be very large, and
  • users of, say, Numpy, would have to run the same overload generator, which would be very inconvenient.

@hmc-cs-mdrissi
Copy link

I wouldn't expect users to run it. Maintainers of numpy/typeshed maybe.

The closest equivalent today is tools like mypy-protobuf. mypy-protobuf generates stubs based on .proto files. We could explore extended type system and call it .pyix files with a tool that converts .pyix to normal .pyi file. That tool would be run as part of building wheel for python package and users that download numpy/stub package would only ever see the final .pyi files and should never need to know about .pyix.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Mar 20, 2023

I wouldn't expect users to run it. Maintainers of numpy/typeshed maybe.

Consider the leaky_integrate function I linked above. It needs access to the result-type mechanism, so a library that uses Numpy would actually need to run the generator for Numpy types on its own functions.

That is, unless Numpy could export some kind of type-mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests

6 participants