Skip to content

bpo-44187: Quickening infrastructure #26264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jun 7, 2021

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented May 20, 2021

First step toward implementing PEP 659.

https://bugs.python.org/issue44187

@markshannon markshannon requested a review from a team as a code owner May 20, 2021 11:50
@bratao
Copy link

bratao commented May 22, 2021

@markshannon one thing I noticed is the introduction of HotPy prefix that do not exist on CPython. I understand that the heritage of this code based on some previous projects.
But would not be better to simply call PyCacheEntry, for example? I can see some developer trying to make sense of the Hot prefix, as it have a meaning in JITs.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my first round of comments; I haven't gotten to specialize.c yet, but I figured I'd send this in case I don't get to that tonight (which is likely).

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I got through everything this time, and I think I understand it. Some suggestions to make it easier to understand for new readers.

Comment on lines 40 to 42
static uint8_t adaptive[256] = { 0 };

static uint8_t cache_requirements[256] = { 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables could use some comment explaining their purpose. (Also, maybe we should plan to generate these from info added to opcode.py, like opcode_targets.h?

cache_offset = i/2;
}
else if (oparg > 255) {
/* Cannot access required cache_offset */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in some kind of debug mode it would be nice to report whether this happens at all? If we see this frequently we need to change the strategy. OTOH maybe we never expect it and we could put assert(0) here???

/* Cannot access required cache_offset */
continue;
}
cache_offset += need;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this will over-count if there are eligible opcodes with an EXTENDED_ARG prefix.

Python/ceval.c Outdated
@@ -1343,6 +1343,14 @@ eval_frame_handle_pending(PyThreadState *tstate)
#define JUMPTO(x) (next_instr = first_instr + (x))
#define JUMPBY(x) (next_instr += (x))

/* Get opcode and opcode from original instructions, not quickened form. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Get opcode and opcode from original instructions, not quickened form. */
/* Get opcode and oparg from original instructions, not quickened form. */

return &last_cache_plus_one[-1-n].entry;
}

/* Following two functions determine the index of a cache entry from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem correct - they take the index and return oparg or offset.

@markshannon
Copy link
Member Author

@gvanrossum , @iritkatriel I think I've addressed all your comments.
Specifically, I've:

  • Factored out the common code in optimize and entries_needed
  • Made sure that previous_opcode is correct to avoid super-instructions stomping on adaptive ones
  • Added more checks for EXTENDED_ARG to avoid wasting memory
  • Fixed up lots of comments.
  • Added an extended comment showing the layout of the quickening data

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Maybe we should ask Dino to have a look? (Between this PR and PEP 659 he should have enough to judge the design, right?)

Also, please look at the compiler warnings found by GitHub Actions for Windows (x64).

<instr 0> <instr 1> <instr 2> <instr 3> <--- co->co_first_instr
<instr 4> <instr 5> <instr 6> <instr 7>
...
<instr N-1>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe M-1? The number of cache entries doesn't have to match the number of instructions.

}
previous_opcode = opcode;
}
return cache_offset+1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return cache_offset+1;
return cache_offset + 1; // One extra for the count entry

@markshannon
Copy link
Member Author

I've been implementing some specialization of LOAD_ATTR in another branch and from that it became clear that two more things were needed:

  1. We need to use the next index (index + 1) for efficiency as that is what is available in the interpreter.
  2. We need to initialize the adaptive cache for adaptive instructions (or they crash).

I've implemented those in the last three commits.

gvanrossum added a commit to faster-cpython/tools that referenced this pull request Jun 1, 2021
}
Py_ssize_t size = PyBytes_GET_SIZE(code->co_code);
int instr_count = (int)(size/sizeof(_Py_CODEUNIT));
if (instr_count > MAX_SIZE_TO_QUICKEN) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible instead to quicken the first 5000 instructions and then exit? (That would avoid a cliff where a minor change in the code tips it over the no-optimization limit and makes it run much slower.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a memory cost to creating a duplicate code array, so it would risk wasting memory unproportionally.

@markshannon markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 6, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit ec50298 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 6, 2021
@markshannon markshannon added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 7, 2021
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @markshannon for commit ab3a30b 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 7, 2021
@markshannon
Copy link
Member Author

Windows tests were failing before this PR.
We seem to be right on the edge of running out of stack on Windows when up against the recursion limit.

@markshannon markshannon merged commit 001eb52 into python:main Jun 7, 2021
@markshannon markshannon deleted the quickening-infrastructure branch June 7, 2021 17:56
@@ -73,9 +73,10 @@ def get_pooled_int(value):
alloc_deltas = [0] * repcount
fd_deltas = [0] * repcount
getallocatedblocks = sys.getallocatedblocks
getallocatedblocks = sys.getallocatedblocks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo. #26624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants