Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to distinguish between Sequence[str]/Iterable[str] and str? #256

Open
jtatum opened this issue Jul 29, 2016 · 61 comments
Open

Possible to distinguish between Sequence[str]/Iterable[str] and str? #256

jtatum opened this issue Jul 29, 2016 · 61 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@jtatum
Copy link

jtatum commented Jul 29, 2016

If a function expects an iterable of strings, is it possible to forbid passing in a string, since strings are iterable? Seems like there are many cases where this would be an error, but I don't see an obvious way to check 't','h','i','s'.

@gvanrossum
Copy link
Member

Since str is a valid iterable of str this is tricky. Various proposals have been made but they don't fit easily in the type system.

@vedgar
Copy link

vedgar commented Aug 28, 2016

I think type should never lie, even if it is a white lie. Either we should remove str.__iter__ (or make it yield something else than strs), or we should allow passing 'abc' into a function expecting Iterable[str]. Of course, I'm for second option.

Or we should have a special type name for "iterable of strings that is not a string". Strings are already special, as AnyStr shows. But although AnyStr is able to be represented using more primitive operations, I think it's too early to introduce a "type difference" operation in general. E.g. co(ntra)variance seems weird in that case.

@gpshead
Copy link
Member

gpshead commented Nov 20, 2018

The problem I have with allowing Sequence[str] or Iterable[str] to be satisfied by str is that the problem of passing a str in where a sequence of (generally non single character) strs is really intended is a common API misuse that a type checker needs to be able to catch.

People can over-specify their APIs by requiring List[str] or Tuple[str] as input instead of the more general sequence or iterable but this is unnatural when teaching people how to type annotate. We'd prefer to just tell everyone to always prefer Iterable or Sequence on input.

Random thought: Would it be possible for our "magic" Text type to lose it's __iter__? So that Iterable[Text] works as desired and forbids a lone str argument?

@ssbr
Copy link

ssbr commented Nov 20, 2018

It requires more work on the part of API authors, but one option that might be less of a lie is to be able to delete an overload. C++ has a similar problem, where a type being passed in might "work" but you want to forbid it. So they implemented a special overload that, if matched, causes an error. See e.g. this SO thread.

Then one could define the API for Iterable[str], and delete the overload for str.

@gvanrossum
Copy link
Member

Mypy doesn't currently have a way to remove methods in a subclass, because it would fail Liskov. But there's a hack possible. You can change the signature of a method override in a way that violates Liskov, and then add a # type: ignore to prevent mypy from complaining. Mypy will then check uses according to the override! So maybe something like this (untested) could be made to work:

class Text(Sequence[str]):
    def __iter__(self) -> None: ...  # type: ignore

@ilevkivskyi
Copy link
Member

It actually doesn't work. Because currently there is a rule in mypy: "nominal first" (for various important reasons), if something works using nominal subtyping, then mypy just uses it. In this case Text is still a nominal subtype of Sequence[str].

@gvanrossum
Copy link
Member

Hm... Maybe Text could be a Protocol that has the same methods as Sequence except for one?

@ilevkivskyi
Copy link
Member

Maybe, TBH I am still not sure what are the costs/benefits here. I am afraid making such big changes in typeshed can break many existing code. But on the other hand if someone wants to do this "locally" it should be a fine solution.

@JukkaL
Copy link
Contributor

JukkaL commented Nov 20, 2018

A relatively simple approach would be to special case str vs. Iterable[str] / Sequence[str] compatibility in a type checker. This behavior could be enabled through a strictness option. This issue seems quite specific to str (and unicode) so anything more drastic may not be worth it.

@Michael0x2a
Copy link
Contributor

If we assume the type checker has reasonable good dead code analysis capabilities, we could get a solution that's pretty similar to the one C++ has for free by combining @overload and NoReturn. For example:

from typing import Iterable, overload, NoReturn

@overload
def foo(x: str) -> NoReturn: ...
@overload
def foo(x: Iterable[str]) -> int: ...
def foo(x):
    if isinstance(x, str):
        raise Exception()
    else:
        return 1

def main() -> None:
    x = foo(["hello", "world"])
    reveal_type(x)

    # Maybe report dead code?
    y = foo("hello")
    reveal_type(y)

That said, idk if any type checkers actually do handle this case gracefully. Mypy, for example, will just silently ignore the last reveal_type (and warn that y needs an annotation).

Maybe to help this analysis, we could add some sort of ShouldNeverBeEncountered type? Type checkers could add a special-case that reports an error whenever they see some function call evaluates to this type, but otherwise treat it as being identical to NoReturn.

It's not a perfect solution since there's still no definitive way of telling if an Iterable[str] is a str or not, but it'd at least give library authors a way to catch some of the more obvious misuses w/o requiring their users to use a special Text-like protocol.

@gpshead
Copy link
Member

gpshead commented Nov 21, 2018

Given the norm for most APIs is to accept the iterable and never want plain str we should aim to support that as a trivial annotation that doesn't involve multiple defs and overloading. the oddball situation where someone wants to accept the iterable and plain str should be the complicated one if complexity is needed.

@vedgar
Copy link

vedgar commented Nov 21, 2018

I think we're trying to expand type hints beyond their original purpose, and it shows.
If I say a_string.rstrip('abc'), the function is going to work perfectly. It will, according to its specification, produce a "copy" of a_string, from which all as, bs and cs are removed at the end. There isn't going to be any "hidden type errors", "accidental mechanisms" or "unintended consequences" that the type hints are usually trying to prevent. Mypy has nothing to do here. 'abc' is just a compact way to write an iterable of strs, that yields 'a', 'b' and 'c' in that order, and then stope.
What you're now trying to do is go beyond "do types match" (they do, absolutely) into "did the caller really intend to write this". And that is a dangerous crossing of responsibility boundaries. Yes, there is a sentence in PEP 484 about mypy being "a powerful linter", but I really think noone wanted mypy to take over all responsibilities of a linter. At least I hope so.
In short: is passing a str as an Iterable[str] a common error? I don't know (in my experience it is not, but of course you have more experience). Does it need to be flagged by a linter? Probably. Are type hints the right way to catch it? NO.

@JukkaL
Copy link
Contributor

JukkaL commented Nov 21, 2018

Maybe Text could be a Protocol that has the same methods as Sequence except for one?

Unfortunately this would make Text incompatible with str and would generally break typeshed and existing annotations.

@martindemello
Copy link

I like the idea of special-casing strings in the tool rather than in the type system, since as @gvanrossum notes, str is an iterable of str (turtles all the way!). Also this sort of type-aware linting is a neat idea, and could be done relatively easily within the typechecker because we have all the information at hand.

@gvanrossum
Copy link
Member

all the information at hand

Do we? When I see a function that takes an Iterable[str] or Sequence[str] -- how do we know it is meant to exclude str? Or do we just assume it is always excluded?

@martindemello
Copy link

I was thinking always excluded; I've run into problems in both python and other languages where a function expecting an iterable was passed a string, and never (that I can think of) actually wanted a generic iterable to treat a string as an iterable of chars. That should hold even more strongly if the function specifies Iterable[str]; it is a good hint that str is being viewed as an atomic type there.

@gvanrossum
Copy link
Member

Hm, I guess you could add it back explicitly by saying Union[str, Iterable[str]].

Would this extend to e.g. Iterable[AnyStr]?

@martindemello
Copy link

I think so, yes; I want to say that str|bytes|unicode should not satisfy Iterable[anything] if the flag is passed in.

@gpshead
Copy link
Member

gpshead commented Nov 21, 2018

In short: is passing a str as an Iterable[str] a common error?

Yes. We have seen this specific bug multiple independent times at work. Unfortunately more than once after deployment in production.

I consider it a motivating anti-pattern for a type checker to help avoid. No other tool can validate this, it requires type information. Lets not be purists here. It improves developer productivity and code maintainability to flag this and we have a way to explicitly annotate the less common APIs that want to accept both. :)

Requiring such APIs to specify Union[str, Iterable[str]] is a good example of explicit is better than implicit. (edit: that'd be str | Iterable[str] in modern syntax.)

@gvanrossum
Copy link
Member

We have seen this specific bug multiple independent times at work.

I recall about how Rob Pike (who famously has just 'r' as his username) once got spammed when some script that sent email invoked an email-sending API with a single email address instead of a list.

@vedgar
Copy link

vedgar commented Nov 22, 2018

If we're going to go EIBTI route, why not be explicit where it counts?

for char in a_string.chars():

It would also help in distinguishing iterating through combined characters (graphemes), and be almost analogous to iterating through words with .split() and lines with .splitlines().

While we're at it, I would be very happy with for line in a_file.lines(), again giving the ability to be explicit with a_file.records(sep=...) or a_file.blocks(size=...).


Yes, I know what the response is going to be. But if we really don't want to change the language, maybe it really is not the problem of the language as a whole, but of a specific API. And there I don't see any problem with writing

if isinstance(an_arg, str):
    raise ValueError('an_arg is supposed to be an iterable of non-single-character strings')

Again, it's not the type that's wrong (although you can raise TypeError above if you want:). Analogy: there are many functions that declaratively accept int, but in fact work only with nonnegative numbers. In fact, I think there are more such functions than the ones that work out of the box with negative integers. Are we going to redefine that an annotation n: int really means a nonnegative integer, and require people who want int to mean int to jump through hoops?

@yhlam
Copy link

yhlam commented Jan 14, 2020

I found this thread because I am looking for a way to annotate some code like below:

from typing import Iterable, overload, Optional


USER_IDS = {'Alice': 1, 'Bob': 2, 'Carol': 3}


@overload
def get_user_id(name: str) -> Optional[int]: ...


@overload
def get_user_id(name: Iterable[str]) -> Iterable[Optional[int]]: ...


def get_user_id(name):
    if isinstance(name, str):
        return USER_IDS.get(name)
    else:
        return [USER_IDS.get(item) for item in name]

Currently, mypy (v0.730) gives error: Overloaded function signatures 1 and 2 overlap with incompatible return types

Not sure if anyone suggested this before, perhaps we can add a "negative" or "difference" type. Similar to Union that is an analogy to the set operator |, Diff[A, B] corresponds to the - operator, which matches anything that is type A but not type B.

Having the Diff type, we can annotate the above code as:

from typing import Diff, Iterable, overload, Optional


USER_IDS = {'Alice': 1, 'Bob': 2, 'Carol': 3}


@overload
def get_user_id(name: str) -> Optional[int]: ...


@overload
def get_user_id(name: Diff[Iterable[str], str]) -> Iterable[Optional[int]]: ...


def get_user_id(name):
    if isinstance(name, str):
        return USER_IDS.get(name)
    else:
        return [USER_IDS.get(item) for item in name]

@sirosen
Copy link
Contributor

sirosen commented Jul 1, 2020

I ended up here looking for a way to handle a case almost identical to the above, trying to specify different overloads for str vs Sequence[str].
A trivial example:

def foo(value):
    if isinstance(value, str):
        return len(value)
    return [len(x) for x in value]

How can I annotate such a function such that

foo("abc") + 1
max(foo(["a", "bcdefg"]))

are both valid? As far as I can tell, I have to give up and say def foo(value: Sequence[str]) -> Any.

It's worth noting explicitly that this is distinct from the case in which we want to write

def bar(xs: Sequence[str]):
    return max([len(x) for x in xs] or [0])

and forbid xs="abc".

I'm not trying to use type checking to forbid using a string -- I'm trying to correctly describe how the types of arguments map to the types of potential return values.


Generalizing beyond strings, it seems like what's wanted is a way of excluding a type from an annotation which would otherwise cover it.

I'd love to see

def foo(value: str) -> int: ...
def foo(value: Excluding[str, Sequence[str]]) -> List[int]: ...

or even for this to be deduced from overloads based on their ordering:

def foo(value: str) -> int: ...
def foo(value: Sequence[str]) -> List[int]: ...

with the meaning that the first annotation takes precedence.

@MajorDallas
Copy link

I've also run into this issue. In my case, I have functions that iterate over the given parameter assuming said parameter is a List or Tuple of strings. If a string were passed in by accident, things will break. Ex:

def check_users_exist(users: Union[List[str], Tuple[str]]) -> Tuple[bool, List[Tuple[str, int]]]:
    results = []
    for user in users:
        results.append((user, request.urlopen(f'{addr}/users/{user}').status))  # This breaks if `users` is eg. "username", as we ask for `u`, `s`, etc.
    return all([199 < x[1] < 300 for x in results]), results
# Granted, an `isinstance` check would save us here, but it would be great if the type checker could alert us, too.

The annotation I'm using, Union[List[str], Tuple[str]], does work for my needs, but it's clunky and goes against the recommendation to use a more generic Sequence[str] or Iterable[str] type. Indeed, this recommendation might actually encourage bugs like this example as programmers comply with it, not remembering that str counts as both Sequence and Iterable.

It's possible to hide the problem with a type alias, I suppose, eg. NonStrSequence = Union[List[str], Tuple[str]]. Or, one could artificially impose a mutability requirement (thereby excluding str) with MutableSequence, and this would work if one is fine with also excluding tuples.

Maybe, since it doesn't seem possible to differentiate str from other Sequence types by protocol, typing could have some kind of Not[T] generic? Like, Union[Sequence[str], Not[str]]? It's not much nicer to type than what I'm already using, but does at least allow any Sequence subtype to be used while still ensuring a str can't be passed in--and not requiring the programmer to explicitly enumerate each and every possible subtype of Sequence, including unknown custom subtypes.

Unfortunately, I'm not knowledgeable enough to know if this isn't a dumb idea. I just know it would make my life a little easier if it existed 😃

@GMNGeoffrey
Copy link

I'm yet another user overspecializing my type annotations to avoid this bug because I was using the more generic Iterable[str] and then accidentally passed a string. Please give us some way to specify "Iterable of string but not string". Whether it's a new type annotation or applies to Iterable[str] or Iterable[Text] doesn't matter much to me, but the fact that this is still impossible 5 years later makes me sad

@spacether
Copy link

Why wouldn't using the new iterable class be an opt in? I am not proposing changing Iterable in general.

@eli-schwartz
Copy link

You would then have to convince all of the consumers of iterables to use that, which is practically impossible when you consider that some consumers can't be changed.

No, you'd have to convince all the people who currently diehard insist on annotating their own API as List[Union[str, int, bool, CustomObject]] and utterly refuse to use Sequence[Union[str, int, bool, CustomObject]] or support people passing in tuples of str, that they should...

... switch from List to NonTextSequence, preserving their API that accepts iterable container-specific types (str is not a container-specific type, even if it also apparently serves as a container) which contain various things such as str. But also making it easier to not care whether list or tuple is used. And also making it easier to not have to deal with covariant/invariant confusion or sometimes using typing.cast on the grounds that these diehards believe casting is less evil than Sequence[str] permitting str.

@NeilGirdhar
Copy link

No, you'd have to convince all the people who currently diehard insist on annotating their own API as List[Union[str, int, bool, CustomObject]] and utterly refuse to use Sequence[Union[str, int, bool, CustomObject]] or support people passing in tuples of str, that they should...

Sorry, but what does this have to do with my comment?

In this thread, my suggestion was changing str from inheriting from Sequence to inheriting from Sized, Container for type checkers, so it seems that we might agree, but it's hard to tell what you're driving at.

Why wouldn't using the new iterable class be an opt in? I am not proposing changing Iterable in general.

Then the benefit would be really small since almost no one is going to opt in. In particular, generic parameters Sequence[T] are still going to accept strings when you probably don't want them to.

@eli-schwartz
Copy link

Sorry, but what does this have to do with my comment?

In this thread, my suggestion was changing str from inheriting from Sequence to inheriting from Sized, Container for type checkers, so it seems that we might agree, but it's hard to tell what you're driving at.

Your suggestion would seem to be different from your comment, then. (In fact I don't even think I've seen you make a suggestion here but maybe it was from back in the 2+ years range which I didn't fully read/remember).

The comment I replied to was your response to @spacether. The suggestion of @spacether which you were providing commentary on, was NOT about changing what str inherits from.

I believe that @spacether's suggestion would be very suitable for my use cases. In fact, it's even been suggested before somewhere in one of the various GitHub issues on this topic spread across multiple repos. I can't remember which issue though. :(

The core issue here is that semantically it is "correct" for str to point out that it is, in fact, capable of iteration, and for functions that accept iterables and can validly do something with them, to accept a str. It's valid to do list('string') for example.

But for third-party APIs belonging to a personal project, there are often functions that aren't generic language primitives like list() and actually expect to receive an iterable of, say, filenames. There's no type for a string whose contents are a valid filename. What you get is an iterable of strings.

This API needs a way to type itself such that it accepts the things it accepts and doesn't accept the things it doesn't accept. We don't need to change the definition of a string, but we do need to have a type annotation that says "accept any iterable of strings, as long as it isn't an instance of str".

The suggestion by @spacether (and others in the past) could be one way to achieve that goal.

Changing what a str is doesn't really help, because it's been repeatedly shot down as "not technically correct" (and the maintainers are correct in that analysis, it is not technically correct, so it's difficult to argue that they should do the wrong thing just because it's convenient for me). Changing what a str is, just leads to people repeatedly arguing back and forth about whether it is technically correct, whether it's practically correct, whether people who need actual generic iterable protocols need to change their correct annotations, etc.

In practice, people who don't want to annotate sequences of str and get bugs when they get a str, are doing one of two things:

  • using pytype
  • avoiding Iterable and Container, and typing as actual honest to goodness List

Both of these groups are concerned exclusively with their own code. Both of these groups would be happy to change their own code which is non-portable (fails mypy) / buggy (variance problems hacked around with casting or whatnot), if it means an end to the problem.

Offering brand new and never before seen types that are a figment of the type checker's imagination, like NonTextSequence or choice of bikeshedded names (ContainerizedHolderOfDiscreteCovariantOriginalAndUnmodifiedElements?) seems like a win all around. Iterables/sequences are still technically correct, and my code is also technically correct because it annotates that it accepts sequences as long as they are the particular subcategory of sequences that don't generate entirely new elements by running a chunking algorithm with length 1 against the element I did give.

No one has to be convinced to do anything they don't want to, because nothing changed. Only new stuff got added. All code in the world, anywhere and everywhere, still means exactly what it did beforehand... but now people can describe new concepts they couldn't describe before, if they want to.

@spacether
Copy link

spacether commented Jul 15, 2023

Is it possible to bind a typevar to subset classes of Sequence?, like str, and typing.List[str].
Then one could have code that uses the TypeVar.

T = typing.TypeVar('T', str, typing.List[str])

def some_fun(constrained_seq: typing.Sequence[T]):
    pass

@NeilGirdhar
Copy link

NeilGirdhar commented Jul 16, 2023

Your suggestion would seem to be different from your comment, then. (In fact I don't even think I've seen you make a suggestion here but maybe it was from back in the 2+ years range which I didn't fully read/remember).

My suggestion is linked in my comment. The discussion is extensive in case you're curious.

The suggestion of @spacether which you were providing commentary on, was NOT about changing what str inherits from.

Right, I was providing an alternative solution that has a different set of costs and benefits.

The core issue here is that semantically it is "correct" for str...

I think the core problem that we're solving is strings being improperly used as sequences. Any solution has to balance:

  • the amount of churn it induces in getting correct code to pass (A) with
  • the erroneous code that it flags with errors (B).

There is also a question of code elegance:

  • is the code made uglier by the machinations it takes to get it to pass (ignore statements, use of weird types, etc.), or
  • is the code made prettier by forced documentation (e.g., a typing expression like ClassVar makes code easier to read).

Changing what a str is doesn't really help, because it's been repeatedly shot down as "not technically correct"

I think you should read the entire linked thread. Nothing is "out of the question" because it's "not technically correct". Type checkers today flag a huge amount of "technically correct" code.

Both of these groups are concerned exclusively with their own code...

Even if you're concerned with only your code, you can still accidentally pass strings to interfaces that expect sequences. The NonTextSequence solution means that the callee decides on protection--not the caller. And the callee may not want to block strings (to keep A small), but in doing so, few errors are caught (so B is made even smaller).

And since this whole process is opt-in, it means that you have to convince all of the maintainers to update their code. This is trickier than it seems because of generic functions that accept things like Sequence[T]. And it's a huge amount of churn (makes A big).

seems like a win all around

I think the best way to make a convincing case of one solution over another would be to use mypy-primer with different solutions and estimate A and B.

For the NonTextString, count up all the places you need to add NonTextString (A), and then count up all the errors that are caught because of this (B).

For the non-sequence-str, count up all the errors it catches (B), and all of the places that you need to access a string as a sequence (using a property like s.chars, which goes into the A bucket).

@eli-schwartz
Copy link

My suggestion is linked in my comment. The discussion is extensive in case you're curious.

Then the suggestion wasn't made here, no wonder I was confused. :/

(Also sorry but I find the Discourse software exceedingly unpleasant to try to read or interact with. There are many forum software packages that are far nicer, and none that are worse, from a user interaction perspective. I'm disinclined to subject myself to experiences I find painful. Maybe you can summarize that thread here?)

I think you should read the entire linked thread. Nothing is "out of the question" because it's "not technically correct". Type checkers today flag a huge amount of "technically correct" code.

Again, I don't know what happens on some linked thread elsewhere, but in the family of GitHub issues in which this Sequence[str] topic was discussed there has been considerable resistance to making str falsely claim that it cannot be iterated over, or annotating the list function as not accepting list('string').

What code is being flagged as wrong despite being technically correct at the type theory level? Certainly, lots of code is being flagged as wrong because its types don't match up, even though the code works as expected... that's not what "technically correct typing" means.

Even if you're concerned with only your code, you can still accidentally pass strings to interfaces that expect sequences. The NonTextSequence solution means that the callee decides on protection--not the caller. And the callee may not want to block strings (to keep A small), but in doing so, few errors are caught (so B is made even smaller).

I don't know what this means. If interfaces expect sequences and can do something meaningful with a sequence of type str that gets unpacked to a progression of single characters, then it's the callee that should be deciding this, whether that means annotating as Sequence[str] with mypy semantics, or Union[Sequence[str], str] with pytype semantics.

What does keeping A small mean? The callee wants to keep bad code that performs incorrectly, mypy-passing, and therefore chooses to exclude useful information from its API contract because ? This doesn't sound like a convincing point of discussion...

And since this whole process is opt-in, it means that you have to convince all of the maintainers to update their code. This is trickier than it seems because of generic functions that accept things like Sequence[T]. And it's a huge amount of churn (makes A big).

In general, asking people whose code is currently incorrect (read: failing to correctly communicate what typed inputs are valid) to change, results in fewer people with ruffled feathers than asking people whose code is currently correct and are expecting the documented behavior to work the way it does, to change.

Also since you mention below "using a property like s.chars" I assume that this opt out solution also depends on runtime changes, which is another hard sell, because it won't work until projects can start requiring python 3.13 as a minimum version, you can't import new runtime properties of the str type from typing_extensions after all. That means I, personally, cannot benefit from said change until 2028, when the unreleased python 3.12 is both released and EOL.

As for generic functions, those don't need changing at all -- e.g. list() wants to communicate to its callers that they can do list('string') if they want to unpack a list into a progression of single-character strings.

There's no churn anyway -- for an opt-in change, the size of A is 0.

I think the best way to make a convincing case of one solution over another would be to use mypy-primer with different solutions and estimate A and B.

I'm extremely aware that in FOSS the best way to convince maintainers is to write the patch yourself and demo it in action. Thanks for pointing that out.

If I had any knowledge of type checkers other than running them, I'd certainly be inclined to try just that! But I don't, and it's an intimidating topic to me.

Although since this logic works equally well to make a convincing case for one solution without looking at other solutions, I'm curious -- have you followed that advice and implemented the solution you suggested? If so, I'd be very curious to see the patches and mypy_primer results. ;)

Since it appears to require runtime modifications to builtin types in python itself, I assume your patches include patches for mypy, typeshed, and also CPython. Very impressive. :)

@AlexWaygood
Copy link
Member

I realise that the status quo here is frustrating for many, and that a lot of people care deeply about improving this situation. Let's just all remember to give each other the benefit of the doubt in this discussion and avoid sarcasm wherever possible. If the tone of this discussion continues to deteriorate, I'll have to lock this thread.

@NeilGirdhar
Copy link

NeilGirdhar commented Jul 16, 2023

What code is being flagged as wrong despite being technically correct at the type theory level?

Off the top of my head, decorators that implement the descriptor protocol and have a different behavior than the functions they decorate—are currently expected to produce a callable with the same behavior.

then it's the callee that should be deciding this,

Yes, but the problem is that it catches few errors that way. (B is too small.)

What does keeping A small mean?

It means keeping the amount of code changes that you need to make small.

In general, asking people whose code is currently incorrect

That's the point: just because the code is valid Python, it doesn't mean that it's correct. That's what the Pytype people found: That when they narrowed the base classes, it caught coding errors.

Also since you mention below "using a property like s.chars" I assume that this opt out solution also depends on runtime changes, which is another hard sell, because it won't work until projects can start requiring python 3.13 as a minimum

Another option would be to allow a cast to get the old behavior.

There's no churn anyway -- for an opt-in change, the size of A is 0.

The size of A varies with the number of times you have to change your code to opt in.

have you followed that advice and implemented the solution you suggested? If so, I'd be very curious to see the patches and mypy_primer results.

I did spend a couple hours playing with the mypy-primer, but it was harder than I expected. I don't remember what wasn't working.

Since it appears to require runtime modifications to builtin types in python itself, I assume your patches include patches for mypy, typeshed, and also CPython. Very impressive. :)

I think you just need to change the typeshed, but it's possible that I would have needed to change MyPy as well. It's been a few months, and I don't remember.

If I had any knowledge of type checkers other than running them, I'd certainly be inclined to try just that! But I don't, and it's an intimidating topic to me.

You might try convincing the author of BasedMyPy to implement set difference. They have already done intersection (A & B), and logically set difference (A \ B) may be similar to implement. Then, you can try to sprinkle in some NonTextString annotations (an alias for Iterable[Any] - str as per the discussion above) into various interfaces and see if you can find any bugs with it.

@spacether
Copy link

spacether commented Aug 22, 2023

@twoertwein your solution works. Thanks for posting it. The working code is here:

import typing

_T_co = typing.TypeVar("_T_co", covariant=True)


class Sequence(typing.Protocol[_T_co]):
    """
    if a Protocol would define the interface of Sequence, this protocol 
    would NOT allow str/bytes as their __contains__ is incompatible with the definition in Sequence.
    methods from: https://docs.python.org/3/library/collections.abc.html#collections.abc.Collection
    """
    def __contains__(self, value: object, /) -> bool:
        raise NotImplementedError
    
    def __getitem__(self, index, /):
        raise NotImplementedError

    def __len__(self) -> int:
        raise NotImplementedError

    def __iter__(self) -> typing.Iterator[_T_co]:
        raise NotImplementedError

    def __reversed__(self, /) -> typing.Iterator[_T_co]:
        raise NotImplementedError

And it looks like bytes is also incompatible with it per:
https://github.com/python/typeshed/blob/4ca5ee98df5654d0db7f5b24cd2bd3b3fe54f313/stdlib/builtins.pyi#L684

And the python interpreter imposes this behavior too. x in y where y is a str uses __contains__. An exception is raised:

>>> 1 in "abc"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'in <string>' requires string as left operand, not int

and for bytes:

>>> None in b"abc"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: a bytes-like object is required, not 'NoneType'

@hauntsaninja
Copy link
Collaborator

Based on the suggestion in this thread, I've added a SequenceNotStr type to https://github.com/hauntsaninja/useful_types

@bluenote10
Copy link

Based on the suggestion in this thread, I've added a SequenceNotStr type to https://github.com/hauntsaninja/useful_types

We have experimented with the SequenceNotStr approach, but it doesn't really solve the crux of the issue. The main issue seems to be that it is "viral" and cannot be combined with third-party usages of Sequence. To avoid going too much off-topic here, I've opened a dedicated issue here: hauntsaninja/useful_types#21

Overall, I really wish this could be reconsidered. In my opinion it feels like a mistake to typecheck str as Sequence[str] or Iterable[str]. I'll repeat my reasoning from the other issue (python/mypy#11001 (comment)) because this is probably the proper place for it:


The problem is that str is a Sequence[str] (or str is an Iterable[str]) is partially a lie.

Technically these statements are true, but semantically most often not.

The perhaps ~99% use case of a str is to store multiple characters. In practice, everything involving all kinds of names, identifiers, texts, etc. involves arbitrary length strings. Python unfortunately cannot distinguish between characters and arbitrary length strings, and the issue we are facing here is basically a result of that: str satisfies Sequence[str] in the sense that it can provide a sequence of characters but not a sequence of arbitrary length strings. In that sense the str lies that it can serve as sequence of str, because it will never be able to provide a sequence of arbitrary length strings. So from a practical perspective, it is pretty obvious that passing a str into e.g. a variable names: Sequence[str] is a bug, because names semantically wants "multiple arbitrary length strings" and not "multiple characters".

Regarding the rare use cases that actually want a sequence of characters: Since this can be easily solved explicitly via characters: Union[str, Sequence[str]], I'd very much favor the pytype behavior.

@joaoe
Copy link

joaoe commented May 26, 2024

Hi.

  1. SequenceNotStr is a really bad name. Things should be named after what they do, not what they don't do.
  2. There is the case of bytes which many comments seem to have missed, but needs to be equally handled.
  3. A more generic name like CharacterSequence which would include str, bytes, buffer, memoryview and whatnot would perhaps be a better proposal ?
  4. If the idea 3 makes sense there would need to be some extra syntax to negate types, e.g. def myfn(seq: Sequence | ~CharacterSequence), but that would introduce a whole new can of worms in the typing system.
  5. Else have a name like ItemSequence for sequences containing individual objects (the NotStr case).

@dpinol
Copy link

dpinol commented May 29, 2024

This is a real pain and has been open since 2016.
Maybe it's an excuse to implement a very simple case of type intersection & negation?
https://discuss.python.org/t/type-intersection-and-negation-in-type-annotations/23879

thanks

robsdedude added a commit to robsdedude/neo4j-python-driver that referenced this issue Jun 14, 2024
(Async)Neo4jBookmarkManager's `initial_bookmarks` parameter, as well as
`Bookmarks.from_raw_values` `values` parameter accept anything of type
`Iterable[str]`. Unfortunately, `str` itself implements that type and yields
the characters of the string. That most certainly not what the user intended.

In an ideal world, we could tell the type checker to only accept `Iterable[str]`
when *not* of type `str`. See also python/typing#256.

To help users out, we now explicitly check for `isinstance(input, str)` and turn
such inputs into an iterable with the input string as the only element.
@AlexeyDmitriev
Copy link

@dmoisset hi, have you tried to push your suggestion?

@gpshead
Copy link
Member

gpshead commented Jun 26, 2024

I think what users want is what was elaborated above of just codifying a special case for str in Python type checkers as the norm (as pytype has done) so that str isn't type checked as iterable unless explicitly spelled that way in an annotation. (#256 (comment) and earlier)

I realize this may be considered annoying from a type checker and theory purist implementation standpoint, but it is how most users should think of code marked : str. Rather than needing to use an un-obvious annotation other than the built-in itself. Make the uncommon use case (iterating over individual unicode codepoints) the one that requires a longer annotation.

IMNSHO, waiting for a general purpose generic expressible within typing syntax solution rather than this one practical special case does not feel like it is helping users.

robsdedude added a commit to neo4j/neo4j-python-driver that referenced this issue Jun 26, 2024
(Async)Neo4jBookmarkManager's `initial_bookmarks` parameter, as well as
`Bookmarks.from_raw_values` `values` parameter accept anything of type
`Iterable[str]`. Unfortunately, `str` itself implements that type and yields
the characters of the string. That most certainly not what the user intended.

In an ideal world, we could tell the type checker to only accept `Iterable[str]`
when *not* of type `str`. See also python/typing#256.

To help users out, we now explicitly check for `isinstance(input, str)` and turn
such inputs into an iterable with the input string as the only element.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests