-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-106581: Split CALL_BOUND_METHOD_EXACT_ARGS into uops #108462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)
@carljm Interested in reviewing this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I haven't had much opportunity to play with tier 2 yet, so I'm reviewing without the benefit of experience working in this code. But I took a pretty careful look (including the generated code), and I think I understand everything that's happening here, and it all makes sense.
stack_pointer[-1 - oparg] = self; // Patch stack as it is used by _INIT_CALL_PY_EXACT_ARGS | ||
func = Py_NewRef(((PyMethodObject *)callable)->im_func); | ||
stack_pointer[-2 - oparg] = func; // This is used by CALL, upon deoptimization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so these lines appear to be entirely redundant for the uops executor, as the stack-effect output of func, self, ...
causes the exact same assignments to be emitted automatically right after these lines.
But this is needed for the bytecode interpreter. Since in that case all the uops are squashed together and their inputs and outputs are chained together as local variables, but we really need to update the actual stack here, for the reasons mentioned in the comments.
I don't know how often such uop stack-patching cases will occur (from what I can find, this is the first one?). If there will be more, it might be nice to have syntax to mark an output in the stack-effects definition of the uop as "must actually modify the stack", and then have the cases generator automatically emit this (and the executor cases wouldn't have the duplicate assignments.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
It looks to be pretty uncommon -- calls are special in many ways. Agreed that if this proliferates we should teach the code generator to do this. It also looks like a C compiler would have a hard time optimizing the duplication in Tier 2 away, because there's an intervening Py_DECREF()
. If that becomes an issue but it remains limited to just this case we could surround the flushes with #ifdef TIER_ONE
/ #endif
.
I'll merge this and see what's next on the agenda. (I suspect either KW_NAMES
or splitting LOAD_ATTR
specializations.)
Instead of using
GO_TO_INSTRUCTION(CALL_PY_EXACT_ARGS);
we just add the macro elements of the latter to the macro for the former. This requires lengthening the uops array in struct opcode_macro_expansion. (It also required changes to stacking.py that were merged already.)