Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update overload docs #5229

Merged
merged 4 commits into from
Jul 3, 2018
Merged

Conversation

Michael0x2a
Copy link
Collaborator

This pull request:

  1. Moves documentation about overloads into the "More types" section.

    (Rationale: although overloads aren't really a type, I feel it makes sense for them to live somewhere in the "Type System Reference" section instead of the "Miscellaneous" section.)

  2. Modifies the docs to start with a non-dunder example (resolves Docs for function overloading are made more complex by using __getitem__ as example #4579) and simplifies the __dunder__ example.

  3. Adds some documentation about the "pick the first match" rule.

  4. Removes the now out-of-date note about erasure.

This pull request:

1.  Moves documentation about overloads into the "More types" section.

    (Rationale: although overloads aren't really a type, I feel it
    makes sense for them to live somewhere in the "Type System
    Reference" section instead of the "Miscellaneous" section.)

2.  Modifies the docs to start with a non-dunder example (resolves
    python#4579) and simplifies the
    `__dunder__` example.

3.  Adds some documentation about the "pick the first match" rule.

4.  Removes the now out-of-date note about erasure.
@Michael0x2a Michael0x2a changed the title Clean up overload docs Update overload docs Jun 17, 2018
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Here are some comments, sorry for being "picky", this is a complex topic, so I want to find the best wording.

.. _function-overloading:

Function overloading
********************
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this section to the summary at the start of this file.

# Raise an exception

While this function signature works, it's too loose: it implies we
could receive either address object regardless of the number of arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"receive" is not the best word here. I think it would be better formulated as "...it implies the result of a call could be either...."


@overload
def ip_address(a: int, b: int, c: int, d: int) -> IPv4Address:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer the "typeshed style" for overloads. Also PEP 544 recommends "typeshed style" for protocols. In general, I think ... should be used in typing context, while pass should be used in runtime context. So I would write:

    @overload
    def ip_address(a: int, b: int, c: int, d: int) -> IPv4Address: ...

etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- the exception would be Python 2 code (where you must use pass), but then you'd have to use # type: comment anyway. Note that PEP 484 uses ... in its stub examples but somehow switches to pass for the non-stub example. However I prefer the ... form everywhere. We should probably update PEP 8 (however, see the debate on python/peps#671).


class MyList(Sequence[T]):
@overload
def __getitem__(self, index: int) -> T: pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about pass -> ... also applies to this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer not to have a blank line after each dummy definition -- not having the blank line emphasizes that these are all part of a single function definition. (PS. I didn't test this example, but please do.)

This means that an overloaded function is still an ordinary Python
function! There is no automatic dispatch handling: you must manually
handle the different types in the implementation (usually by using
``if`` statements and ``isinstance`` checks).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace "usually" with "e.g."

There are a few exceptions to the "pick the first match" rule.
For example, if multiple variants match due to an argument
being of type ``Any``, mypy will make the inferred type also
be ``Any``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace this paragraph with an example showing a call with Any to an above function (especially if you reword as I suggest above).


Mypy will also prohibit you from writing overload variants that are
inherently unsafely overlapping: for example, writing two variants
that accept the same arguments but return different types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this paragraph belongs to the below section.

that accept the same arguments but return different types.

Type checking the implementation
--------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to "Type checking the definition" and include more info here. In particular I would show two examples: one showing the "unsafely overlaps error" (e.g. with int -> int and object -> str), and one with "will be never matched" (briefly). For first example it would be great to explain the motivation with and example x: object = 42 -> f(x) -> BOOM!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this section/the surrounding text: I wonder if it might be better to just link to PEP 484, once the overload stuff is accepted there.

I'm currently trying to write the int -> int and object -> str example -- and it gets mired in the weeds pretty quickly. In particular, I'm finding it challenging to concisely explain why mypy flags this example as being unsafe yet accepts the List[int] -> int and List[str] -> str example despite the fact that we can trigger the exact same kind of error in both cases.

(More specifically, it'd be trivial for me to write an explanation that describe the design decisions behind this behavior in detail; I'm finding it non-trivial to write an explanation that's satisfying yet concise.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding it non-trivial to write an explanation that's satisfying yet concise.

Yes, I know this is the hardest part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it might be better to just link to PEP 484, once the overload stuff is accepted there.

I'd rather be independent from that -- the purpose here is to be the best/clearest user documentation possible, while the goal of the PEP is to be as precise a specification as possible. Those are often conflicting goals.

@@ -178,6 +178,211 @@ create subclasses of these objects either.

name_by_id(3) # int is not the same as UserId

.. _function-overloading:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a link to this section at the end of the Callable section.


To minimize potential issues, we recommend ordering your variants
from most to least specific. Your implementation should also
perform ``isinstance`` checks and the like in the same order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace "isinstance checks" to "runtime dispatch checks" (or something similar).

@ilevkivskyi ilevkivskyi self-assigned this Jun 17, 2018
# type hints.
#
# Mypy will also check and make sure the signature is
# consistent with the provided variants.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example pass? I tried to recreate it (with fewer args) and got errors:

class B:
    pass

class C:
    pass

@overload
def foo(a: int) -> B: pass
@overload
def foo(a: int, b: int) -> C: pass
def foo(*args: int) -> Union[B, C]:  # line 21; errors here
    if len(args) == 1:
        return B()
    if len(args) == 2:
        return C()
    raise RuntimeError

The errors are:

_.py:21: error: Overloaded function implementation does not accept all possible arguments of signature 1
_.py:21: error: Overloaded function implementation does not accept all possible arguments of signature 2

Did I do something wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I do something wrong?

It looks like a bug. @Michael0x2a could you take a look, seems something simple.


@overload
def ip_address(a: int, b: int, c: int, d: int) -> IPv4Address:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- the exception would be Python 2 code (where you must use pass), but then you'd have to use # type: comment anyway. Note that PEP 484 uses ... in its stub examples but somehow switches to pass for the non-stub example. However I prefer the ... form everywhere. We should probably update PEP 8 (however, see the debate on python/peps#671).


class MyList(Sequence[T]):
@overload
def __getitem__(self, index: int) -> T: pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer not to have a blank line after each dummy definition -- not having the blank line emphasizes that these are all part of a single function definition. (PS. I didn't test this example, but please do.)

ignored: they're overridden by the final implementation function.

This means that an overloaded function is still an ordinary Python
function! There is no automatic dispatch handling: you must manually
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I like it just fine. :-)

function should be omitted: stubs do not contain runtime logic.

Type checking calls to overloads
--------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but most users will first be encountering the details discussed here when calling some overloaded function from a library. And the examples above already hint at how the definitions are checked. So I like the existing order just fine.

be either a ``List[str]`` or a ``List[int]``. In this case, mypy
will break the tie by picking the first matching variant: ``output``
will have an inferred type of ``str``. The implementor is responsible
for making sure ``summarize`` breaks ties in the same way at runtime.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, here the context might actually help...

The body of an implementation is type-checked against the
type hints provided on the implementation. For example, in the
``MyList`` example up above, the code in the body is checked with
``index: Union[int, slice]`` and a return type ``Union[T, Sequence[T]]``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(on line 356) with -> with argument list
return type -> return type of

``MyList`` example up above, the code in the body is checked with
``index: Union[int, slice]`` and a return type ``Union[T, Sequence[T]]``.
If there are no annotations on the implementation, then the body is
not type checked.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe clarify that this is different from having all args and return annotated with Any. That's standard, but somehow a reminder in this context seems helpful.

.. note::

Due to the "pick the first match" rule, changing the order of your
overload variants can change how mypy type checks your program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also mention that older versions of mypy (up to 0.610) used a different rule here? Or that PEP 484 is currently not clear on this?

overload variants can change how mypy type checks your program.

To minimize potential issues, we recommend ordering your variants
from most to least specific. Your implementation should also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I order them if they are all disjoint? (E.g. the ip_address example.)

@Michael0x2a
Copy link
Collaborator Author

@ilevkivskyi and @gvanrossum -- I think this is ready for a second look whenever you have time.

Some notes:

  1. I edited the overview at the top to include the text as requested
  2. I tried switching the ip address example to something that wouldn't require us to talk about positional-only args, but wasn't really able to find anything I was happy with. (E.g. I thought about making an example where we return date vs datetime objects, but it ended up looking messier). I guess on the plus side, talking about positional-only args do hint to the read that types aren't the only thing that matters.
  3. I switched to having the variant bodies use ..., but decided against removing the blank lines between each variant: IMO it made the overloads look a little intimidatingly dense. That said, I don't feel too strongly about this issue, so I'm happy to remove them if people disagree.
  4. I ended up making a new section about how overload variants are type checked. I couldn't really find a way to make it concise, unfortunately -- sorry about that.
  5. Regarding the "we pick the best match" thing: I didn't really elaborate on it because I think the current behavior is mostly intuitive (?). If people disagree, I can spell out what exactly we mean by "best". I think it mostly boils down to talking about how star-args are handled?
  6. I decided to keep the current topic order: I like using the spiral approach when teaching, and the topics basically do that: we start with examples and a high-level overview that communicate the gist of how to use overloads, then drill down into the nitty gritty semantics of calls, variant checks, and implementation checks (in that order).

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is almost ready, I added few more comments.

def ip_address(__a: int, __b: int, __c: int, __d: int) -> IPv4Address: ...
@overload
def ip_address(__a: int, __b: int, __c: int, __d: int,
__e: int, __f: int, __g: int, __h: int) -> IPv6Address: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't like these underscores, they can confuse the reader by giving an impression this is something related to overloads. Why not just use def ip_address(*args, **kwds): ... as an implementation signature? You can even add a code like if kwds: raise TooLazyError("you should call this function with positional args").

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think going with ip_address(*args, **kwargs) would ruin the narrative that overloads let you "refine" the type of your function to something more precise. Both options are also a kludge -- in that case, I'd rather use the kludge that teaches the reader something new and useful about mypy rather then handwaving the complication away.

That said, I think a better option would be to find a different example that avoids this issue completely. I'm just that I'm having difficulty coming up with a replacement that isn't totally contrived.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at mypy's own code and there is a single example of overload. It looks like this:

@overload
def preferred_currency(c: PaidCustomer) -> Currency: ...
@overload
def preferred_currency(c: Customer) -> Optional[Currency]: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation may be like this:

def preferred_currency(c: Customer) -> Optional[Currency]:
    if isinstance(c, PaidCustomer):
        # Use currency from subscription plan
    elif in_app_purchases(c):
        # Use most commonly used currency
    else:
        return None

Type checking the variants
--------------------------

Although the "pick the first match" algorithm works well in many cases, it can
Copy link
Member

@ilevkivskyi ilevkivskyi Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introductory sentence can make a confusing impression that this is some kind of limitation of mypy's implementation. You can just say something simpler like "Not all possible variant combinations are considered safe by mypy. Let us consider an example of unsafe variants:"

if isinstance(x, int):
return x
else:
return "Unsafe!"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear for a typical reader why it is unsafe at this point, use a different word here, maybe "Hm, not really a number".


If we examine just the annotated types, it seems as if ``bad_var`` ought to be of
type ``str``. But if we examine the runtime behavior, ``bad_var`` will actually be
of type ``int``!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of saying the reasoning in words, it would be better to just show a code snippet, that will fail with a TypeError at runtime. For example, I would replace your snippet and subsequent explanation by this snippet:

yolo: object = 42
unsafe_func(yolo) + "why not?"  # inferred type for the call is "str", but this will
                                # fail at runtime because the function will actually return 42

So in this example, the ``int`` argument in the first variant is a subtype of
the ``object`` argument in the second, yet the ``int`` return type is a subtype of
``str``. Both conditions are true, so mypy will correctly flag ``unsafe_func`` as
being unsafe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like these three paragraphs they nicely put on formal grounds the above example.

``str``. Both conditions are true, so mypy will correctly flag ``unsafe_func`` as
being unsafe.

However, it is unfortunately impossible to detect *all* unsafe overloads without
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this long part about multiple inheritance is unnecessary I think. It can be replaced with a single short note at the end of this section like: "Note that mypy intentionally allows unsafe overloads related to multiple inheritance and empty containers. This way mypy avoids lots of false positives (that would otherwise make overloads impractical) by having some rare false negatives."

This is really a discussion that belongs to the PEP, not the docs. Remember that you write this mostly for people who have never heard about overloads, while currently it looks like you are talking to a CS prof: "Please forgive us for being so unsafe!" :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mm, I'm ok with just leaving a short snippet like that if we're ok with also linking to the PEP and telling the reader to read that if they want more details.

But I was under the impression that we wanted the docs to be more self-contained? In that case, I think we really do need at least one example of what an unsafe yet allowed overload could look like. I don't think a reader would really understand what we mean if with just that explanation .

One option I could try is re-using the summarize example instead -- that would let me cut out all of the setup related to multiple-inheritance which I think would help cut down the text by a decent amount. I still want to keep the gist of recommendations at the end and some of the exposition though. Empty collections are relatively common and some people do like to use mixins -- I think it's worth giving people a more detailed heads up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I was under the impression that we wanted the docs to be more self-contained?

Self-contained doesn't mean complete, in particular here too much attention is devoted to motivation why we allow this. You could try re-using summarize if this will help cutting the size of this significantly.


Older versions of mypy used different semantics. In particular, overload
variants had to different after erasing types and calls to overloaded
functions did not use the "pick the first match" rule.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mention that mypy previously often silently returned Any for many overload calls. So that people will be not surprised by new errors when switching to 0.620.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

``Combined()`` is an instance of both ``A`` and ``B``; the empty list
``[]`` is an instance of both ``List[int]`` and ``List[str]``.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like an example of "...will be never matched" is missing in this section.


To minimize potential issues, we recommend ordering your variants
from most to least specific. Your implementation should also
perform runtime checks (e.g. ``isinstance`` checks) in the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, now it seems to me it should be vice-versa here. People typically add annotations to existing code, so it is better to say "Your variants should be listed in the same order as runtime checks (e.g. isinstance checks) performed in the implementation."

@gvanrossum
Copy link
Member

I lost track -- who is waiting for whom here?

@Michael0x2a
Copy link
Collaborator Author

@gvanrossum -- this PR is currently waiting on me. I've implemented most of Ivan's feedback but haven't had the chance to sit down and do the last bit of polishing yet.

@Michael0x2a
Copy link
Collaborator Author

@ilevkivskyi and @gvanrossum -- Sorry for the delay! I think this should be ready for another look now.

Specific substantial changes made:

  1. Swapped out the IP address example for something involving "mouse events".

    (It's still a little contrived, but whatever -- I can showcase overloads w/ different arities, give an example of a call that matches the impl but not the variants, so good enough.)

  2. Added an example/explanation of the "X will never be matched" error.

  3. Replaced the example with multiple inheritance with an example about 'summarized' and tried making the exposition at the end less wordy/less "doom-and-gloom".

  4. Fleshed out the section about old mypy semantics a little

  5. Adjusted the section about recommendations for ordering overload variants to place more emphasis on making sure the runtime impl and variant orders mirror each other.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have few comments, but they are all minor. I think this can be merged after you consider them.

elif x2 is not None and y2 is not None:
return DragEvent(x1, y1, x2, y2)
else:
raise Exception("Bad arguments")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either ValueError or TypError will be better here.

elif x2 is not None and y2 is not None:
return DragEvent(x1, y1, x2, y2)
else:
raise Exception("Bad arguments")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

.. note::

While we can leave the variant body empty using the ``pass`` keyword,
the more common convention is to instead use the ellipse (``...``) literal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"ellipsis"

def summarize(data: List[int]) -> int: ...

@overload
def summarize(data: List[str]) -> str: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having a signature that can't be expressed without overloads? For example, the first overload may be List[int] -> float and the implementation could return an average for integers, and a most common for strings.

unsafe_func(some_obj) + " danger danger" # Type checks, yet crashes at runtime!

This program type checks according to the annotations, but will actually crash
at runtime! Since ``some_obj`` is of type ``object``, mypy will decide that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence "This program..." just repeats the comment above. I would just remove it. (Especially that we have an explanation in the next sentence.)


We run into a similar issue here. This program type checks if we look just at the
annotations on the overloads. But since ``summarize(...)`` is designed to be biased
towards returning an int when it receives an empty list, this program will actually
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would format this as ``int`` (or rather ``float`` if you update the signature as suggested).

def summarize(data: List[str]) -> str: ...

def summarize(data):
if len(data) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not data:

@Michael0x2a Michael0x2a merged commit f5058bc into python:master Jul 3, 2018
@Michael0x2a Michael0x2a deleted the update-overload-docs branch July 3, 2018 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docs for function overloading are made more complex by using __getitem__ as example
3 participants