Skip to content

GH-118095: Make BINARY_SUBSCR_GETITEM suitable for tier 2 #120793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 1, 2024

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Jun 20, 2024

Makes BINARY_SUBSCR_GETITEM suitable for tier 2, including fix for #118540

@brandtbucher
Copy link
Member

I think I see the issue. When projecting, we have special handling for _PUSH_FRAME that assumes the cache layout is the same as _PyCallCache, and it reads the func_version to continue projecting out of that. I think that currently, it's reading garbage and failing to continue projecting, but maybe it occasionally gets something that looks like a valid function version present in the cache... yikes!

So I think for this to work, we need to add a cache entry to every BINARY_SUBSCR, where BINARY_SUBSCR_GETITEM stores the __getitem__'s func_version. The current specialization is sort of nice in that it works for any heap type with a cached __getitem__, but I don't think making it specific to one cached function would hurt too much (note that this will still work for polymorphic sites with the same inherited __getitem__ function).

@markshannon
Copy link
Member Author

@brandtbucher Thanks for the analysis

@markshannon
Copy link
Member Author

It looks like we bail out of projection in _PUSH_FRAME if the opcode is FOR_ITER_GEN. We can just do the same for BINARY_SUBSCR_GETITEM for now and consider a more sophisticated approach later.

@markshannon
Copy link
Member Author

Well, that was a bug. But not the bug that's causing this to fail.

@brandtbucher
Copy link
Member

This failure seems interesting:

test_combine_stack_space_checks_large_framesize (test.test_capi.test_opt.TestUopsOptimization.test_combine_stack_space_checks_large_framesize) ... python: Python/optimizer.c:952: translate_bytecode_to_trace: Assertion `trace_length < max_length' failed.

@markshannon markshannon marked this pull request as ready for review June 25, 2024 12:47
@markshannon markshannon requested a review from brandtbucher June 25, 2024 13:32
Copy link
Member

@brandtbucher brandtbucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame that we need to fish the method out of the type twice, but otherwise this is a good, straightforward change. Just one question about the other change being smuggled in alongside it:

@brandtbucher
Copy link
Member

I just kicked off JIT benchmarks and stats for you.

@brandtbucher brandtbucher self-assigned this Aug 1, 2024
@brandtbucher
Copy link
Member

2% fewer tier one instructions. Overall perf impact is in the noise, but two SymPy benchmarks got 15% faster.

@brandtbucher brandtbucher merged commit df13a18 into python:main Aug 1, 2024
55 checks passed
@markshannon markshannon deleted the binary-subscr-getitem-tier-2 branch August 6, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants