Skip to content

Refactor reversible operators #5475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 16, 2018

Conversation

Michael0x2a
Copy link
Collaborator

@Michael0x2a Michael0x2a commented Aug 14, 2018

This pull request refactors and reworks how we handle reversible operators like __add__.

Specifically, what our code was previously doing was assuming that given the expression A() + B(), we would always try calling A().__add__(B()) first, followed by B().__radd__(A()) second (if the __radd__ method exists).

Unfortunately, it seems like this model was a little too naive, which caused several mismatches/weird errors when I was working on refining how we handle overlaps and TypeVars in a subsequent PR.

Specifically, what actually happens is that...

  1. When doing A() + A(), we only ever try calling A.__add__, never A.__radd__. This is the case even if __add__ is undefined.

  2. If B is a subclass of A, and if B defines an __radd__ method, and we do A() + B(), Python will actually try checking B.__radd__ first, then A.__add__ second.

    This lets a subclass effectively "refine" the desired return type.

    Note that if B only inherits an __radd__ method, Python calls A.__add__ first as usual. Basically, B must provide a genuine refinement over whatever A returns.

  3. In all other cases, we call __add__ then __radd__ as usual.

This pull request modifies both checker.py and checkexpr.py to match this behavior, and adds logic so that we check the calls in the correct order.

This ended up slightly changing a few error messages in certain edge cases.

Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Aug 14, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

It also swaps out the code for detecting overlaps with operators and
removes some associated (and now unused) code.

This pull request builds on top of python#5474
and python#5475 -- once those two PRs are
merged, I'll rebase this diff if necessary.

This pull request also supercedes python#5475 --
that PR contains basically the same code as these three PRs, just
smushed together.
mypy/nodes.py Outdated
'__add__',
'__sub__',
'__mul__',
'__truediv__',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include __div__ in Python 2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point. I originally excluded it because none of the other data structures in the lines above and below included __div__ -- it seems like mypy handles __div__ in a mostly ad-hoc and inconsistent basis in general. (For example, it seemed like previously we never checked to see if __rdiv__ was unsafely overlapping with __div__ at all, we don't support from __future__ import division at all...)

Properly fixing that is probably something that's better done in a separate PR, but I went ahead and added it here + made a few other adjustments as a stopgap measure.

This pull request refactors and reworks how we handle reversible
operators like __add__.

Specifically, what our code was previously doing was assuming that given
the expression `A() + B()`, we would always try calling
`A().__add__(B())` first, followed by `B().__radd__(A())` second (if the
`__radd__` method exists).

Unfortunately, it seems like this model was a little too naive, which
caused several mismatches/weird errors when I was working on refining
how we handle overlaps and TypeVars in a subsequent PR.

Specifically, what actually happens is that...

1. When doing `A() + A()`, we only ever try calling `A.__add__`, never
   `A.__radd__`. This is the case even if `__add__` is undefined.

2. If `B` is a subclass of `A`, and if `B` defines an `__radd__` method,
   and we do `A() + B()`, Python will actually try checking `B.__radd__`
   *first*, then `A.__add__` second.

   This lets a subclass effectively "refine" the desired return type.

   Note that if `B` only *inherits* an `__radd__` method, Python calls
   `A.__add__` first as usual. Basically, `B` must provide a genuine
   refinement over whatever `A` returns.

3. In all other cases, we call `__add__` then `__radd__` as usual.

This pull request modifies both checker.py and checkexpr.py to match
this behavior, and adds logic so that we check the calls in the
correct order.

This ended up slightly changing a few error messages in certain edge
cases.
@Michael0x2a Michael0x2a force-pushed the refactor-reversible-operators branch from 4ca1879 to 0243661 Compare August 14, 2018 21:35
Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Aug 14, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

It also swaps out the code for detecting overlaps with operators and
removes some associated (and now unused) code.

This pull request builds on top of python#5474
and python#5475 -- once those two PRs are
merged, I'll rebase this diff if necessary.

This pull request also supercedes python#5475 --
that PR contains basically the same code as these three PRs, just
smushed together.
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good, I have several comments, all of them are minor.

mypy/checker.py Outdated
@@ -1043,72 +1043,102 @@ def check_overlapping_op_methods(self,
"""Check for overlapping method and reverse method signatures.

Assume reverse method has valid argument count and kinds.

Precondition:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a common terminology, but I would make it clear that the caller should ensure this (i.e. this function should not be called otherwise).

mypy/checker.py Outdated
forward_item: CallableType,
forward_base: Type,
reverse_type: CallableType) -> bool:
# TODO check argument kinds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like colons after TODO, NOTE, etc.

mypy/checker.py Outdated
# non-overlapping.
# This behavior deviates from how we handle overloads -- many of the
# modules in typeshed seem to define __OP__ methods that shadow the
# corresponding __rOP__ method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a very nice explanation. 👍

mypy/checker.py Outdated
# corresponding __rOP__ method.
#
# Note: we do not attempt to handle unsafe overlaps related to multiple
# inheritance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add "as well as we do for overloads" or something of this kind?

mypy/checker.py Outdated
# Not a valid operator method -- can't succeed anyway.
return False

# Erase the type if necessary to make sure we don't have a dangling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explain "dangling" a bit more, maybe add "(i.e. single)"?

# Sometimes, the variants list is empty. In that case, we fall-back to attempting to
# call the __op__ method (even though it's missing).

if len(errors) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why checking the length of errors instead of just length of variants?

Also, if len(seq) == 0 should be if not seq.

errors.append(local_errors)
results.append(result)
else:
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand how can this ever succeed, could you please add a comment with an example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally added that out just in case, but it seems we do fall into that case at least once -- for example, adding an assert there makes this test case fail.

I added a TODO.

mypy/messages.py Outdated
reverse_type: Type,
reverse_method: str,
context: Context) -> None:
msg = "{rfunc} will not be called when running '{cls} {op} {cls}': must define {ffunc}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say "evaluating" instead of "running" here.

[case testReverseOperatorOrderingCase5]
class A:
def __add__(self, other: B) -> int: ...
def __radd__(self, other: A) -> str: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do double check: this is not an unsafe overlap because only one of the two can be ever called? Could you please add a clarifying comment here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I added a comment.

# A refinement made by a parent also counts
reveal_type(A() + C()) # E: Revealed type is 'builtins.str'


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add few tests for operator methods that involve generic classes and overloaded methods (just for completeness).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few tests for overloads.

I'm not sure if I want to add tests for generics here though, since the way operators handle generics are still broken in this PR. I fixed this in my follow-up PR though, and added a few test cases there.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! You can merge this after you fix the last few comments here. Then rebase the overlap PR and I will review it.

# Currently, it seems we still need this to correctly deal with
# things like metaclasses?
#
# E.g. see the pythoneval.testMetaclassOpAccessAny test case.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add details about this to #5136?

# This is probably related to the TODO in lookup_operator(...)
# up above.
#
# TODO: Remove this extra case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is something deserving a follow-up issue. Could you please open one?

# gracefully -- it doesn't correctly switch to using __truediv__ when
# 'from __future__ import division' is included, it doesn't display a very
# graceful error if __div__ is missing but __truediv__ is present...
# Also see https://github.com/python/mypy/issues/2048
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add these details to the issue and raise its priority to at least normal? This future import is very important in loads of numeric code.

return self.check_op_local(method, method_type, base_type, arg, context, local_errors)

def check_op_local(self,
method_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method_name should be str.

@Michael0x2a Michael0x2a merged commit 7f29b73 into python:master Aug 16, 2018
Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Aug 16, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

It also swaps out the code for detecting overlaps with operators and
removes some associated (and now unused) code.

This pull request builds on top of python#5474
and python#5475 -- once those two PRs are
merged, I'll rebase this diff if necessary.

This pull request also supercedes python#5475 --
that PR contains basically the same code as these three PRs, just
smushed together.
@Michael0x2a Michael0x2a deleted the refactor-reversible-operators branch August 27, 2018 05:45
Michael0x2a added a commit that referenced this pull request Aug 27, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

This new overlapping type check is used for detecting unsafe overlaps
with overloads and operator methods as well as for performing 
reachability/unreachability analysis in `isinstance` and `if x is None`
checks and the like.

This PR also removes some (now unused) code that used to be used
for detecting overlaps with operators.

This pull request builds on top of #5474
and #5475 and supersedes 
#5475.
Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Sep 4, 2018
python#5475 introduced a new type of error
message ("__rop__ will not be called when evaluating 'a + b'...") that
triggers when the user tries evaluating expressions like `foo + foo`
where `foo` does not contain an `__add__` method that accepts a value
of the same type, but *does* contain an `__radd__` method that does.

This pull request removes that error message on the grounds that it's
too cryptic and unlikely to be helpful to most mypy users. That error
message is useful mainly for people developing libraries containing
custom numeric types (or libraries that appropriate operators to create
custom DSLs) -- however, most people are not library creators and so
will not find this error message useful.
ilevkivskyi pushed a commit that referenced this pull request Sep 5, 2018
#5475 introduced a new type of error
message ("__rop__ will not be called when evaluating 'a + b'...") that
triggers when the user tries evaluating expressions like `foo + foo`
where `foo` does not contain an `__add__` method that accepts a value
of the same type, but *does* contain an `__radd__` method that does.

This pull request removes that error message on the grounds that it's
too cryptic and unlikely to be helpful to most mypy users. That error
message is useful mainly for people developing libraries containing
custom numeric types (or libraries that appropriate operators to create
custom DSLs) -- however, most people are not library creators and so
will not find this error message useful.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants