Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore[python]: Enforce strict type equality in mypy check #4058

Merged
merged 1 commit into from
Aug 14, 2022

Conversation

zundertj
Copy link
Collaborator

Fixes mypy strict_equality in #4044.

@github-actions github-actions bot added the python Related to Python Polars label Jul 17, 2022
@zundertj
Copy link
Collaborator Author

Note that the introduction of #3857 allows additional freedom that this change will revert sort of (had to add a couple of type ignores for this). I see the confusion on DataType class vs instantiation, but in the long-run, making sure all code and docs use the same form, is probably cleaner than allowing both forms. But that is my personal opinion, I understand the pro's and cons.

@zundertj zundertj marked this pull request as draft July 17, 2022 14:10
@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2022

Codecov Report

Merging #4058 (e1620ca) into master (1444cd1) will increase coverage by 0.00%.
The diff coverage is 40.00%.

@@           Coverage Diff           @@
##           master    #4058   +/-   ##
=======================================
  Coverage   78.96%   78.96%           
=======================================
  Files         468      468           
  Lines       76982    76985    +3     
=======================================
+ Hits        60785    60790    +5     
+ Misses      16197    16195    -2     
Impacted Files Coverage Δ
py-polars/polars/internals/frame.py 91.92% <ø> (ø)
py-polars/polars/internals/functions.py 93.50% <ø> (ø)
py-polars/polars/internals/lazy_frame.py 77.23% <ø> (ø)
py-polars/polars/internals/type_aliases.py 0.00% <0.00%> (ø)
py-polars/polars/internals/expr.py 95.98% <100.00%> (ø)
py-polars/polars/internals/series.py 92.11% <100.00%> (ø)
...olars-lazy/src/physical_plan/expressions/column.rs 59.81% <0.00%> (-1.87%) ⬇️
polars/polars-core/src/fmt.rs 71.61% <0.00%> (-1.10%) ⬇️
...s/polars-core/src/series/implementations/floats.rs 70.55% <0.00%> (-0.93%) ⬇️
...polars-time/src/chunkedarray/rolling_window/mod.rs 71.53% <0.00%> (-0.77%) ⬇️
... and 12 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@zundertj
Copy link
Collaborator Author

Python 3.7 tests fail because I used typing.cast in several places to denote the output type(s) of a dataframe operations (such as indexing into a dataframe) as generic types, such as list[int]. As that is not type annotation, but "normal code", the from __future__ import annotations does not work f, because, well, that is annotations only (see also https://bugs.python.org/issue45117).

The only solution I see is for these instances to go back to pulling in types from the typing module, but it feels ugly. Can't use ignores, because that would be flagged in Python 3.10 as unnecessary.

@stinodego, any other ideas?

@stinodego
Copy link
Member

stinodego commented Jul 25, 2022

@stinodego, any other ideas?

Thanks for looking into this! Two things.

First, doing casts is usually a sign that something isn't typed properly somewhere up the chain. Can we properly type indexing into a DataFrame? That should be the goal of this exercise - to improve the type hints. Not to set all the flags to True.

Second, a more direct answer to your question: I believe an alternative to typing.cast is to declare the type explicitly on a variable (not sure if this is 100% equivalent):

result: list[int] = df[:, 0]
assert result == [1, 2, 3]

Much more readable than typing.cast, too!

@zundertj zundertj force-pushed the mypy_strict_equality branch 2 times, most recently from ae28a69 to 5865e2a Compare July 31, 2022 18:38
@zundertj
Copy link
Collaborator Author

First, doing casts is usually a sign that something isn't typed properly somewhere up the chain. Can we properly type indexing into a DataFrame? That should be the goal of this exercise - to improve the type hints. Not to set all the flags to True.

I don't think we can, because we do not have a way to introspect the types of the dataframe on compilation time. We do not know in advance that a column named "amount" read from a csv file is a Float32 rather than an Int32. So the goal of this PR is less ambitious than that.

Second, a more direct answer to your question: I believe an alternative to typing.cast is to declare the type explicitly on a variable (not sure if this is 100% equivalent):

result: list[int] = df[:, 0]
assert result == [1, 2, 3]

Much more readable than typing.cast, too!

So that will break, because the first line can't be verified at the mypy check (compilation time if you will). Hence if we know it must be an list[int], using typing.cast is better than using type: ignore , as ignore will not add any information, whilst typing.cast makes it clear to mypy and fellow programmers that for subsequent lines of code, result can be assumed to be a list[int].

Is some of this ugly? Yes, a bit. But also unavoidable, unless we are willing to check dataframe dtypes at runtime, and offer an API in polars to support that. For instance, that you would be able to get column 0 by doing df[:,0].int32(), and it would error if the type is not Int32. Usually what happens in practice, if anything on this is done at all, is that dataframes are being checked at runtime in several key places in the code (on ingestion, passing between libs, output) using tools such as Pandera / Patito. That helps very well against some issues (type inference failing on parsing because a column is empty, users passing in dates as strings rather than pd.Timestamp, etc), but that does not integrate with type annotations.

@matteosantama
Copy link
Contributor

matteosantama commented Aug 1, 2022

@zundertj

The only solution I see is for these instances to go back to pulling in types from the typing module, but it feels ugly.

What if you make them strings?

typing.cast("tuple[str, int]", df.row(-1)) == ("seafood", 1)

That should be valid Python 3.7 code, and hopefully mypy will recognize it as a proper annotation.

EDIT: this is also ugly, but you should be able to do

from typing import TYPE_CHECKING

result = df[:, 0]
if TYPE_CHECKING:
    cast(list[int], result)
assert result == [1, 2, 3]

@matteosantama
Copy link
Contributor

@stinodego

First, doing casts is usually a sign that something isn't typed properly somewhere up the chain. Can we properly type indexing into a DataFrame?

I don't see how this would be possible. The closest we can get is with generics, ie.

A = TypeVar("A")
B = TypeVar("B")
C = TypeVar("C")

class DataFrame(Generic[A, B, C]):

    def row(self, index: int) -> tuple[A, B, C]):
        ...

but there is not yet a way to declare a variable number of generics (PEP-646 will get us closer, but that's due until Python 3.11). And even with PEP-646, we would still have to rely on cast

    def col(index: int) -> tuple[A] | tuple[B] | tuple[C]:
        pass

df = pl.DataFrame[int, str, float]({"a": [1], "b": ["foo"], "c": [1.0]})

# MyPy doesn't know if this is an int, str, or float
df.col(1) == ("foo",)

# so we would still need to cast
cast(tuple[str], df.col(1)) == ("foo",)

@stinodego
Copy link
Member

All right, so for now it seems like casting seems the way to go. Though I would prefer to do from typing import cast and then use cast directly, instead of using typing.cast everywhere. I think that matches existing casts best.

And I believe the suggestion by @matteosantama to stringify the types could work.

Also wanted to mention that we could solve some of our problems by simply ignoring the warnings for unused ignores for problematic modules; something like:

[[tool.mypy.overrides]]
module = "tests.*"
warn_unused_ignores = false

@huijiong
Copy link

hello. l have some problem, but l dont know how to solve this problem ... npm WARN config global --global, --localare deprecated. Use--location=global` instead.
npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path C:\nfts\hashlips_art_engine-main (1)/package.json
npm ERR! errno -4058
npm ERR! enoent ENOENT: no such file or directory, open 'C:\nfts\hashlips_art_engine-main (1)\package.json'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent

npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\gimho\AppData\Local\npm-cache_logs\2022-08-11T15_06_31_744Z-debug-0.log

C:\nfts\hashlips_art_engine-main (1)>

@matteosantama
Copy link
Contributor

@huijiong That looks unrelated to this PR -- I suggest you open an issue and explain exactly what you are trying to do that is causing the error.

@zundertj zundertj marked this pull request as ready for review August 13, 2022 08:16
@zundertj
Copy link
Collaborator Author

Should now be good to go. Had a bit of struggle with Py 3.7 vs 3.10, sometimes cast was accepted in both versions, sometimes an ignore was needed, but overall it is not too bad.

@zundertj zundertj force-pushed the mypy_strict_equality branch from 22f30a8 to c91c829 Compare August 13, 2022 15:36
@zundertj zundertj changed the title Enforce strict type equality in mypy check chore[python]: Enforce strict type equality in mypy check Aug 14, 2022
@zundertj zundertj force-pushed the mypy_strict_equality branch from c91c829 to e1620ca Compare August 14, 2022 06:34
@zundertj zundertj requested a review from ritchie46 August 14, 2022 07:38
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Nice, only one more flag to go 😄

@ritchie46 ritchie46 merged commit beb802a into pola-rs:master Aug 14, 2022
zundertj added a commit to zundertj/polars that referenced this pull request Aug 14, 2022
In pola-rs#4058, I enforce strict type equality checks in mypy. Most cases can be fixed with `typing.cast`. For generic types such as tuple, we cannot use this yet, because py3.7 does not allow this, and thus we need a `type: ignore`. This one case of using the cast slipped through, fixing this here.

As noted by pola-rs#4413 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants