Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understandable error message for incompatible AnyStr arguments #3433

Closed
wants to merge 31 commits into from

Conversation

quartox
Copy link
Contributor

@quartox quartox commented May 24, 2017

Fixes #963

If a function has two or more AnyStr arguments then it requires the passed arguments to be all "str" or all "bytes". This is now explained in the error message and a note.

@quartox quartox changed the title Understandable error message for incompatible AnyStr arguments [WIP] Understandable error message for incompatible AnyStr arguments May 24, 2017
@quartox quartox changed the title [WIP] Understandable error message for incompatible AnyStr arguments Understandable error message for incompatible AnyStr arguments May 24, 2017
@lincolnq
Copy link
Contributor

(Note: I'm not a maintainer of this project and in fact am rather new to it, so this is just one person's random thoughts).

I don't think this is the right approach: AnyStr is a typing module concern, not a mypy concern. Thus this change is brittle (because typing may change), dependent on having to peek into the name in order to succeed at the optimization, and not general enough -- other type variables would not get the same treatment, even identically-declared ones with different names.

Though it might be much harder to execute, I would much prefer to see a change where mypy was smart enough to generalize this logic to any type variable which is reused across function args.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction where this goes, but I think this should be implemented for all type variables with few (2 or 3) type constraints. It should be not that difficult (you can detect type variables by a simple type visitor).

"""
if isinstance(type, Instance) and type.type.name() == 'object':
for string in arg_strings:
if 'AnyStr' in string:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking for string representation is too fragile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to figure out how to use a type visitor to check for the types here (passing callable.arg_types instead of strings). Can you help point me to an example using type visitor to check for types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many of them (use grep -rn mypy -e "TypeVisitor" and/or grep -rn mypy -e "TypeQuery" to see them). One of the simplest and closest to what you need is probably the one in typeanal.get_type_var_names. Another simple example is expandtype.ExpandTypeVisitor.

msg.incompatible_typevar_value(callable, i + 1, type, context)

arg_strings = tuple(msg.format(arg_type).replace('"', '')
for arg_type in callable.arg_types)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should check arg_type for the presence of a constrained type variable here instead of string representation below.

f('a')
f('a', 'b')
f('a', b'b') # E: Type arguments of "f('AnyStr')" have incompatible values \
# N: "AnyStr" arguments must be all "str" or all "bytes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas for additional tests:

  • Situations where there are more than one type variable
  • Situations where AnyStr also appears in the return type.

@quartox
Copy link
Contributor Author

quartox commented Jun 4, 2017

I know there are some failing checks but I wanted to see if this is a good direction.

I am now covering all constrained types, but need more testing of return values.

The biggest remaining problem is with * arguments because they break the indexing for the function arguments. I added a hack that works only if the final argument is a * argument, but it doesn't work with * in the first position of the function arguments. I couldn't figure out how to check for a function with a * argument, but decrementing the index appears to fix the problem.

The indexing in general is weird (see tests for examples). It may not be worth showing the user the index in these cases when it could cause more confusion than it is worth.

I spent a fair amount of time grepping and reading about visitors but I couldn't see a clear example of how to use one without having more variables. I think the new version is better than the string checking, but not sure if it is robust enough.

@ilevkivskyi
Copy link
Member

This looks like moving in the right direction. I will make a more detailed review later(probably tomorrow). In the meantime please fix the problem with typeshed, it shouldn't be part of this PR.

@quartox
Copy link
Contributor Author

quartox commented Jun 4, 2017

I did a checkout of https://github.com/python/mypy.git typeshed/ and it is still showing file changes. The original changes were brought in on a merge and I have no idea how or why it happened. Let me know if I can do anything else to get rid of the typeshed changes.

@gvanrossum
Copy link
Member

Do you have your local clone of the mypy repo configured with an "origin" and an "upstream" remotes? (git remote -v will show, and git remote --help will help you understand how to configure this.)

Typically origin is your own fork and the default for push/pull, while upstream is the python/mypy repo. With such a setup you should be able to do something like git fetch upstream; git rebase upstream/master; git push origin to clear things up. But without the ability to look into and experiment with your git config I can't really help you further -- though stackoverflow probably has the answer (!).

@quartox
Copy link
Contributor Author

quartox commented Jun 5, 2017

Oh, god. I did rebase with upstream (https://github.com/python/mypy.git) master but it has created a monster pull request. I may not have time to fix it for the next few days because I am travelling, but I will try to get back to it soon.

@ilevkivskyi
Copy link
Member

I tried to review the relevant files, but it is really hard, sorry. I will wait until you fix this.

@ilevkivskyi
Copy link
Member

OK, this looks much better now. Will review this PR soon.

@quartox
Copy link
Contributor Author

quartox commented Jun 11, 2017

@ilevkivskyi Yeah, I still have no idea how that happened (I haven't done any rebasing before), but I just did a manual git checkout upstream/master for all the files that I didn't want to be in this PR. I still expect some tests to fail but the code itself should actually be readable now.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some more comments.

It looks like you don't need here any TypeQuery I mentioned. Mypy shows a different error "Cannot infer type" for nested scenarios like:

def combine(first: List[AnyStr], second: List[AnyStr]):
    ...

combine(['a', b'b'])

This error message is also not very helpful, but could be improved in a separate PR.


import mypy.subtypes
from mypy.sametypes import is_same_type
from mypy.expandtype import expand_type
from mypy.types import Type, TypeVarId, TypeVarType, CallableType, AnyType, PartialType
from mypy.types import (
Type, TypeVarId, TypeVarType, TypeVisitor, CallableType, AnyType, PartialType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the trailing space on this line.



def get_incompatible_arg_constraints(arg_types: Sequence[Type], type: Type,
index: int) -> Dict[str, Tuple[str]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually mean Tuple[str, ...] (variable length tuple denoted by literal ellipsis ...)? Otherwise this means tuple of one element with type str.

arg_type.values and
len(arg_type.values) > 1 and
arg_type.name not in constraints.keys()):
constraints[arg_type.name] = tuple(vals.type.name() for vals in arg_type.values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail if vals is not an Instance. If you just want a string representation of a type you can use one of the function in messages.py module (like format_simple or format_distinctly).

Also I would call this variable val not vals.

An example of a constrained type is AnyStr which must be all str or all byte.
"""
constraints = {} # type: Dict[str, Tuple[str]]
if isinstance(type, Instance) and type.type.name() == 'object':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just in case use fullname and compare it to builtins.object?

Also I don't see why object is so special. It appears as a join of str and bytes in case of AnyStr, but this is not always the case for other constrained type variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another simple check to see if the constraints are being violated? I write this as a pretty restrictive check which means the rest of the code won't be executed very often (I was worried about adding too much overhead which might be pre-mature optimization). Maybe see if type.type.name() != arg_types[index].values? But it seems like there must be a better solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another simple check to see if the constraints are being violated?

Yes, probably it would be quite complex. But then could you please make it more clear that we are special-casing AnyStr and similar? (This includes function names and more detailed docstrings.)

# N: "AnyStr" must be all one type: str or bytes
h('a', y=1)
h('a', 'b', y=1)
h('a', b'b', y=1) # E: Type argument 1 of "h" has incompatible value "object"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this to also work. Probably you just need to carefully decode Callable.arg_kinds.

Also have you greped for this error and checked that the message is updated everywhere in existing mypy test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me in the right direction on how to decode Callable.arg_kinds? For this case it is just [2,3] which doesn't mean much to me. Can I compare this to Context.arg_kinds? In this case it is [0,0,3].

The only other time this error occurs is on line 961 of check-overloading.test which I don't think is in scope for this pull request and line 233 of check-typevar-values.text which is actually passing an argument with type object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are described in nodes.py above class CallExpr.... I copy here for convenience:

# Positional argument
ARG_POS = 0  # type: int
# Positional, optional argument (functions only, not calls)
ARG_OPT = 1  # type: int
# *arg argument
ARG_STAR = 2  # type: int
# Keyword argument x=y in call, or keyword-only function arg
ARG_NAMED = 3  # type: int
# **arg argument
ARG_STAR2 = 4  # type: int
# In an argument list, keyword-only and also optional
ARG_NAMED_OPT = 5

If this will take time, then maybe we need to postpone this for a separate PR. I don't want this one to take long.

constraint_str = '{} or {}'.format(values[0], values[1])
elif len(values) > 3:
constraint_str = ', '.join(values[:-1]) + ', or ' + values[-1]
self.note('"{}" must be all one type: {}'.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is better than existing but still quite confusing. Ca you make it less terse?
Maybe something in the spirit of "Arguments 1, 2, and 4 to 'my_function' should be either all 'str' or all 'bytes'. Got (str, str, bytes)."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can probably find the argument indices easily enough, but the types of the other arguments isn't available when constructing the message because the message builder only gets called when the second argument is analyzed (from what I can tell, please let me know if there is something I have overlooked). I can get mypy.nodes.StrExpr and mypy.nodes.BytesExpr from context, but I do not know how to safely get these values as MessageBuilder.format for both of these returns object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then again we could postpone this to the next PR (probably the idea is to use types argument for apply_generic_arguments and map_actuals_to_formals from checkexpr.py).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still I think that error message like "Arguments 1, 2, and 4 to 'my_function' should be either all 'str' or all 'bytes'" will look better than the current one, so it is better to change it in this PR.

# N: "S" must be all one type: int or str \
# E: Type argument 2 of "f" has incompatible value \
# N: "AnyStr" must be all one type: str or bytes
f(1, '2', '3', b'4') # E: Type argument 1 of "f" has incompatible value \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make tests slightly more "sparse"?


An example of a constrained type is AnyStr which must be all str or all byte.
"""
constraints = {} # type: Dict[str, Tuple[str]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe Tuple[str, ...]?

constraints[arg_type.name] = tuple(vals.type.name() for vals in arg_type.values)
elif isinstance(arg_type, UnionType):
for item in arg_type.items:
constraints = add_arg_constraints(constraints, item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic seems to be over-complicated here. You are basically just collecting a set of type variables affected by the error encountered. This should be separated from the error formatting logic (the latter should be kept in messages.py).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to the union type check or the function in general? Using this function recursively appeared to be the easiest way to get the set of all possible types for one argument (since, in principle, the union could contain multiple unions as well).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to the union type check or the function in general?

In general. My main concern here is that error formatting is mixed with the collection of information needed for the error.

An example of a constrained type is AnyStr which must be all str or all byte.
"""
constraints = {} # type: Dict[str, Tuple[str, ...]]
print(index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot a debugging print here.

@@ -38,8 +41,11 @@ def apply_generic_arguments(callable: CallableType, types: List[Type],
types[i] = value
break
else:
msg.incompatible_typevar_value(callable, i + 1, type, context)

constraints = get_incompatible_arg_constraints(msg, callable.arg_types, type, i + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also fix the lint issue -- this line is too long (more than 99 allowed chars).

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Jul 18, 2017

@quartox Thanks for keeping forking on this! Just few more comments. Could you please also fix the test failures (you forgot quotes "..." around the types in some tests)?

quartox added 2 commits August 5, 2017 07:25
Conflicts:
	mypy/applytype.py
	test-data/unit/check-inference.test
	test-data/unit/check-overloading.test
	test-data/unit/check-typevar-values.test
	test-data/unit/pythoneval.test
@quartox
Copy link
Contributor Author

quartox commented Aug 5, 2017

@ilevkivskyi I am running the tests, but I wanted to give you the chance to look at updates in case I don't fix tests before tomorrow.

@quartox
Copy link
Contributor Author

quartox commented Aug 5, 2017

I realized that my string checking for indeces is brittle. Need to rethink it. Any suggestions welcome.

@ilevkivskyi
Copy link
Member

@quartox

I realized that my string checking for indeces is brittle. Need to rethink it. Any suggestions welcome.

TBH, I am a bit lost here. What is the current status? Have you addressed the comments from the previous review?

Also I am not sure what do you mean by "string checking for indeces". This PR takes too long, let us just see if we can get something mergeable from it without making an overly general solution.

@ilevkivskyi
Copy link
Member

@quartox Could you please ping me when this will be ready for a next round of review?
(Note that the tests are still failing.)

@quartox
Copy link
Contributor Author

quartox commented Aug 19, 2017

I am fixing the tests this morning, it should be ready when the tests pass.

@ilevkivskyi ilevkivskyi removed their assignment Aug 19, 2017
@ilevkivskyi
Copy link
Member

After looking at this again, I am quite pessimistic, maybe we should just take a different approach. I have unassigned myself, in case if someone else wants to try pushing this forward. Sorry!

@quartox
Copy link
Contributor Author

quartox commented Aug 19, 2017

@ilevkivskyi thanks for the help. I think I agree, this might be better with a fresh approach, but if another reviewer has other suggestions I am happy to help improve.

@gvanrossum
Copy link
Member

Sadly I think I have to agree with Ivan. Thanks for all your efforts nevertheless! Maybe you are still interested in contributing to other parts of mypy? (In general I would recommend starting on smaller issues -- maybe you have a better appreciation for what that means now that you've worked on this for a while.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants