Skip to content

Generalize subtype caches #5474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 14, 2018

Conversation

Michael0x2a
Copy link
Collaborator

Currently, the typestate module keeps track of two separate subtype caches: one for regular subtypes, and another for proper subtypes.

This pattern unfortunately does not generalize well: for example, in the upcoming overloads PR, it ends up being useful to perform subtype checks with promotions disabled.

The problem is that while performing a subtype check with promotions disabled, we cannot use or store information to the existing is_subtype function without introducing subtle bugs and inconsistencies: if is_subtype previously decided that int is a subtype of float due to promotions, it wouldn't make sense to return that cached result with promotions disabled.

One potential solution is to just add two more caches to TypeState. However, this solution does not generalize well, especially if we were to add additional variations of subtype checks in the future.

Instead, this pull request modifies TypeState so it can keep track of an arbitrary number of subtype caches, each of which is characterized by a 'SubtypeKind'.

The 'SubtypeKind' is an int, which we obtain by taking the hash of a tuple containing a unique string per each subtype method, along with whatever additional parameters are active during that subtype check.

So now, instead of hard-coding calls to record_subtype_cache_entry or record_proper_subtype_cache_entry and friends, we can just call record_subtype_cache_entry and pass in the appropriate SubtypeKind to store the type info in the correct cache.

Since manually juggling parameters and SubtypeKinds is a bit of a pain, this pull request also refactors the two subtype visitors so they call a private helper method that takes care of propagating the parameters instead of directly calling the is_subtype or is_proper_subtype top-level methods.

NOTE: Currently, the mypy codebase does not attempt to perform subtype checks with promotions disabled. That means that this PR should not change the actual behavior of mypy, which is why there are no new tests.

(My subsequent PRs will perform subtype checks w/ promotions disabled.)

Currently, the typestate module keeps track of two separate subtype
caches: one for regular subtypes, and another for proper subtypes.

This pattern unfortunately does not generalize well: for example, in the
upcoming overloads PR, it ends up being useful to perform subtype checks
*with promotions disabled*.

The problem is that while performing a subtype check with promotions
disabled, we cannot use or store information to the existing is_subtype
without introducing subtle bugs and inconsistencies: if is_subtype
previously decided that int is a subtype of float due to promotions,
it wouldn't make sense to return that cached result with promotions
disabled.

One potential solution is to just add two more caches to TypeState.
However, this solution does not generalize well, especially if we
were to add additional variations of subtype checks in the future.

Instead, this pull request modifies typestate so it can keep track
of an arbitrary number of subtype caches, each of which is characterized
by a 'SubtypeKind'.

The 'SubtypeKind' is an int, which we obtain by taking the hash of a
tuple containing a unique string per each subtype method, along with
whatever additional parameters are active during that subtype check.

So now, instead of hard-coding calls to 'record_subtype_cache_entry' or
'record_proper_subtype_cache_entry' and friends, we can just call
'record_subtype_cache_entry' and pass in the appropriate SubtypeKind
to store the type info in the correct cache.

Since manually juggling parameters and SubtypeKinds is a bit of a pain,
this pull request also refactors the two subtype visitors so they call
a private helper method that takes care of propagating the parameters
instead of directly calling the `is_subtype` or `is_proper_subtype`
top-level methods.

NOTE: Currently, the mypy codebase does not attempt to perform subtype
checks with promotions disabled. That means that this PR should not
change the actual behavior of mypy, which is why there are no new tests.

(My subsequent PRs will perform subtype checks w/ promotions disabled.)
Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Aug 14, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

It also swaps out the code for detecting overlaps with operators and
removes some associated (and now unused) code.

This pull request builds on top of python#5474
and python#5475 -- once those two PRs are
merged, I'll rebase this diff if necessary.

This pull request also supercedes python#5475 --
that PR contains basically the same code as these three PRs, just
smushed together.
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful! I just have three comments. Also did you double-check this works with internal codebases?

mypy/subtypes.py Outdated
return True
# NOTE: left.type.mro may be None in quick mode if there
# was an error somewhere.
if left.type.mro is not None:
if not self.ignore_promotions and left.type.mro is not None:
for base in left.type.mro:
# TODO: Also pass recursively ignore_declared_variance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this TODO can be removed since _is_subtype does this automatically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

mypy/subtypes.py Outdated
type_parameter_checker,
ignore_pos_arg_names,
ignore_declared_variance,
ignore_promotions))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the hash call really needed here? As I understand it will be used as a key (tuples can also be keys) so the hash will be called under the hood anyway. Or is it a performance trick?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an attempt at optimization, but I forgot to test it before submitting this PR. After testing, it ended up making little-to-no difference, so I ended up just removing the hash.

mypy/subtypes.py Outdated
elif isinstance(right, Instance):
return is_subtype(left.fallback, right,
ignore_pos_arg_names=self.ignore_pos_arg_names)
return is_subtype(left.fallback, right)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using is_subtype instead of self._is_subtype? If this is some special case, then it deserves a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah whoops -- that was an oversight. Fixed.

@Michael0x2a
Copy link
Collaborator Author

I've tried running #5476 against our internal codebases, but not this one specifically. I'll start the run + post an update once it finishes.

@Michael0x2a
Copy link
Collaborator Author

@ilevkivskyi -- I've confirmed this PR does not introduce any new regressions to our codebases. (I ran the tests before fixing the merge conflict, but I'm pretty sure that change won't affect anything.)

@ilevkivskyi ilevkivskyi merged commit a2ffde9 into python:master Aug 14, 2018
Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Aug 14, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

It also swaps out the code for detecting overlaps with operators and
removes some associated (and now unused) code.

This pull request builds on top of python#5474
and python#5475 -- once those two PRs are
merged, I'll rebase this diff if necessary.

This pull request also supercedes python#5475 --
that PR contains basically the same code as these three PRs, just
smushed together.
Michael0x2a added a commit to Michael0x2a/mypy that referenced this pull request Aug 16, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

It also swaps out the code for detecting overlaps with operators and
removes some associated (and now unused) code.

This pull request builds on top of python#5474
and python#5475 -- once those two PRs are
merged, I'll rebase this diff if necessary.

This pull request also supercedes python#5475 --
that PR contains basically the same code as these three PRs, just
smushed together.
@Michael0x2a Michael0x2a deleted the generalize-subtype-cache branch August 27, 2018 05:46
Michael0x2a added a commit that referenced this pull request Aug 27, 2018
This pull request adds more robust support for detecting partially
overlapping types. Specifically, it detects overlaps with...

1. TypedDicts
2. Tuples
3. Unions
4. TypeVars
5. Generic types containing variations of the above.

This new overlapping type check is used for detecting unsafe overlaps
with overloads and operator methods as well as for performing 
reachability/unreachability analysis in `isinstance` and `if x is None`
checks and the like.

This PR also removes some (now unused) code that used to be used
for detecting overlaps with operators.

This pull request builds on top of #5474
and #5475 and supersedes 
#5475.
ilevkivskyi added a commit that referenced this pull request Sep 12, 2018
Fixes #5514

The root cause is that #5474 added `type_parameter_checker` (a function) as one of the keys of subtype kinds. This function is a nested one, so in fact a _new_ function was created on every call to `is_subtype_ignoring_tvars()` causing many thousands parasitic keys created in subtype caches. In addition there are two smaller things listed below.

The performance win is around 5%
This PR:
* Moves the mentioned function to global scope, this causes only 6 subtype kinds present during self-check.
* Prevents similar regressions in future by switching only boolean flags as subtype kind keys.
* Removes two one liners that stay in a very hot code path (cache lookup is hit 140K times during self-check).
* Avoids creation of empty sets on cache look-up, create them only when writing to cache.
ilevkivskyi added a commit that referenced this pull request Sep 12, 2018
Fixes #5514

The root cause is that #5474 added `type_parameter_checker` (a function) as one of the keys of subtype kinds. This function is a nested one, so in fact a _new_ function was created on every call to `is_subtype_ignoring_tvars()` causing many thousands parasitic keys created in subtype caches. In addition there are two smaller things listed below.

The performance win is around 5%
This PR:
* Moves the mentioned function to global scope, this causes only 6 subtype kinds present during self-check.
* Prevents similar regressions in future by switching only boolean flags as subtype kind keys.
* Removes two one liners that stay in a very hot code path (cache lookup is hit 140K times during self-check).
* Avoids creation of empty sets on cache look-up, create them only when writing to cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants