-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-44187: Quickening infrastructure #26264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-44187: Quickening infrastructure #26264
Conversation
@markshannon one thing I noticed is the introduction of HotPy prefix that do not exist on CPython. I understand that the heritage of this code based on some previous projects. |
…tation to make it less surprising.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's my first round of comments; I haven't gotten to specialize.c yet, but I figured I'd send this in case I don't get to that tonight (which is likely).
Misc/NEWS.d/next/Core and Builtins/2021-05-20-12-43-04.bpo-44187.3lk0L1.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I got through everything this time, and I think I understand it. Some suggestions to make it easier to understand for new readers.
Python/specialize.c
Outdated
static uint8_t adaptive[256] = { 0 }; | ||
|
||
static uint8_t cache_requirements[256] = { 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variables could use some comment explaining their purpose. (Also, maybe we should plan to generate these from info added to opcode.py, like opcode_targets.h?
Python/specialize.c
Outdated
cache_offset = i/2; | ||
} | ||
else if (oparg > 255) { | ||
/* Cannot access required cache_offset */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in some kind of debug mode it would be nice to report whether this happens at all? If we see this frequently we need to change the strategy. OTOH maybe we never expect it and we could put assert(0) here???
Python/specialize.c
Outdated
/* Cannot access required cache_offset */ | ||
continue; | ||
} | ||
cache_offset += need; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this will over-count if there are eligible opcodes with an EXTENDED_ARG
prefix.
Python/ceval.c
Outdated
@@ -1343,6 +1343,14 @@ eval_frame_handle_pending(PyThreadState *tstate) | |||
#define JUMPTO(x) (next_instr = first_instr + (x)) | |||
#define JUMPBY(x) (next_instr += (x)) | |||
|
|||
/* Get opcode and opcode from original instructions, not quickened form. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* Get opcode and opcode from original instructions, not quickened form. */ | |
/* Get opcode and oparg from original instructions, not quickened form. */ |
Include/internal/pycore_code.h
Outdated
return &last_cache_plus_one[-1-n].entry; | ||
} | ||
|
||
/* Following two functions determine the index of a cache entry from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment doesn't seem correct - they take the index and return oparg or offset.
@gvanrossum , @iritkatriel I think I've addressed all your comments.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Maybe we should ask Dino to have a look? (Between this PR and PEP 659 he should have enough to judge the design, right?)
Also, please look at the compiler warnings found by GitHub Actions for Windows (x64).
<instr 0> <instr 1> <instr 2> <instr 3> <--- co->co_first_instr | ||
<instr 4> <instr 5> <instr 6> <instr 7> | ||
... | ||
<instr N-1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe M-1
? The number of cache entries doesn't have to match the number of instructions.
Python/specialize.c
Outdated
} | ||
previous_opcode = opcode; | ||
} | ||
return cache_offset+1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cache_offset+1; | |
return cache_offset + 1; // One extra for the count entry |
I've been implementing some specialization of
I've implemented those in the last three commits. |
} | ||
Py_ssize_t size = PyBytes_GET_SIZE(code->co_code); | ||
int instr_count = (int)(size/sizeof(_Py_CODEUNIT)); | ||
if (instr_count > MAX_SIZE_TO_QUICKEN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible instead to quicken the first 5000 instructions and then exit? (That would avoid a cliff where a minor change in the code tips it over the no-optimization limit and makes it run much slower.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a memory cost to creating a duplicate code array, so it would risk wasting memory unproportionally.
🤖 New build scheduled with the buildbot fleet by @markshannon for commit ec50298 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
🤖 New build scheduled with the buildbot fleet by @markshannon for commit ab3a30b 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
Windows tests were failing before this PR. |
@@ -73,9 +73,10 @@ def get_pooled_int(value): | |||
alloc_deltas = [0] * repcount | |||
fd_deltas = [0] * repcount | |||
getallocatedblocks = sys.getallocatedblocks | |||
getallocatedblocks = sys.getallocatedblocks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. #26624
First step toward implementing PEP 659.
https://bugs.python.org/issue44187