Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record Types #685

Open
saulshanabrook opened this issue Nov 11, 2019 · 9 comments
Open

Record Types #685

saulshanabrook opened this issue Nov 11, 2019 · 9 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@saulshanabrook
Copy link

saulshanabrook commented Nov 11, 2019

I would like to be able to type a Dataframe like object with MyPy, where different columns have different types and you can get each as column as an attribute on the dataframe. This is how libraries like Pandas and Ibis work.

Generally, this requires a function to return different types by mapping string literals to different types (record kinds).

Here is a mock example implemented in Typescript, which checks properly:

class Column {
    mean(): number {
        return 0;
    }
}

class GeoColumn extends Column {
    length(): number {
        return 0;
    }
}

class Dataframe<T extends { [key: string]: Column} > {

    constructor(private cols: T) {

    }

    getColumn<K extends keyof T>(name: K): T[K] {
        return this.cols[name]
    }
}

const d = new Dataframe({ name: new Column(), location: new GeoColumn() });

d.getColumn("name").mean();
// We can call `length` because this is a GeoColumn
d.getColumn("location").length();

Possible Syntaxes

Here are a few possible ways this could be spelled in Python:

self as TypedDict

Since we already have a TypedDict construct one of the least invasive approaches is to type self as a TypeDict.

This would probably require anonymous TypeDicts, which was proposed previously (python/mypy#985 (comment)).

It would also required TypedDicts to be able to take generic parameters.

class Column:
    def mean(self) -> int:
        return 0


class GeoColumn(Column):
    def length(self) -> int:
        return 0


T = TypeVar("T", bound=Dict[str, Column])

K = TypeVar("K", bound=str)
V = TypeVar("V", bound=Column)


class Dataframe(Generic[T]):
    def __init__(self, cols: T):
        self.cols = cols

    def __getattr__(self: Dataframe[TypedDict({K: V})], name: K) -> V:
        return self.cols[name]


d = Dataframe({"name": Column(), "location": GeoColumn()})

d.name.mean()
d.location.length()

Type Level .keys and __getitem__

Another option would be to mirror how Typescript does this, by introducing type level keys and __gettitem__ functions. This would also require generic to depend on other generics (python/mypy#2756).

T = TypeVar("T", bound=Dict[str, Column])

K = TypeVar("K", bound=KeyOf[T])


class Dataframe(Generic[T]):
    def __init__(self, cols: T):
        self.cols = cols

    def __getattr__(self, name: K) -> GetItem[T, K]:
        return self.cols[name]

Conclusion

I would like to have a way to type Dataframes that have different column types in a generic way. This is useful for typing frameworks like Ibis or Pandas.

This is somewhat related to variadic generics I believe (#193). Also related: dropbox/sqlalchemy-stubs#69

@ilevkivskyi
Copy link
Member

This was discussed during one of last typing meet-ups, see at the bottom of summary. I proposed to call these key types (essentially they will act as string generic, similar to integer generics). The syntax I tend to like the most is something like this for attributes (proposed by Guido IIRC):

K = Key("K")

class Proxy(Generic[T]):
    def __init__(self, target: T):
        self.target = target
    def __getattr__(self, name: K) -> T.K:
        return self.target.name

and T[K] for TypedDicts (initially I thought about the latter for both). The syntax for TypedDicts would be still too similar to generics (and will block addition of support for generic TypedDict types), so maybe after all we will settle on something verbose like AttrOf[T, K] and ItemOf[T, K].

Anyway, IMO the question of syntax is not important an this point. This topic (support for static typing of Python numeric stack) is obviously important (but gets constantly deprioritized for mypy team). Also I would say we should start with integer generic (shape types), and string generic (key types) will follow next.

See also an essentially the same proposal on mypy tracker python/mypy#7856

@saulshanabrook
Copy link
Author

The syntax for TypedDicts would be still too similar to generics (and will block addition of support for generic TypedDict types)

Yeah that's why I avoided it.

Do you have a sense if something like this could be first written as a MyPy extension before integrated into core, so we could start building/playing with it?

@ilevkivskyi
Copy link
Member

Do you have a sense if something like this could be first written as a MyPy extension before integrated into core, so we could start building/playing with it?

Unfortunately, I don't think it is possible, it requires changes in some very deep parts of mypy.

@saulshanabrook
Copy link
Author

it requires changes in some very deep parts of mypy

Sounds exciting! :) Well if I end up having bandwidth to work on this I will start poking around and asking more questions.

@theoparis
Copy link

Any updates on this? Its been a year and I really need the keyof alternative in python

@gvanrossum
Copy link
Member

Sorry, nothing yet. Can you describe your application in more detail?

@srittau srittau added the topic: feature Discussions about new features for Python's type annotations label Nov 4, 2021
@tuchandra
Copy link

Here's an example use case:

I have a DataFrame df and want to rename its columns. This is easily done with df.rename(columns=some_mapping) where some_mapping: dict[str, str] is a map between original & new column names. If the original column (key) isn't one of the column names, then the call to .rename doesn't do anything.

I'd like to be able to more precisely type the DataFrame as being TypedDict-ish with its columns, and then type some_mapping: dict[keyof ThisDFType, str]. It's not currently possible to type hint the dict key in this way without statically spelling out all the column names as a Literal["column1", "column2", ...], which is cumbersome.

Or put otherwise:

import pandas as pd

# suppose we have a RecordDF implemented above,
# that extends a TypedDict-like interface,
# with columns {x: pd.Series, y: pd.Series}

df: RecordDF = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})

rename_map: dict[keyof RecordDF, str]
rename_map = {"x": "..."}  # type checks
rename_map = {"z": "..."}  # KeyError on z

@JelleZijlstra
Copy link
Member

JelleZijlstra commented Feb 7, 2023

That sounds like you just want dict[str, int]. (Edit: the comment this was replying to got deleted.)

@tmke8
Copy link

tmke8 commented Apr 8, 2023

I opened a discussion with a syntax proposal and some examples: #1387

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests

8 participants