-
Notifications
You must be signed in to change notification settings - Fork 51
Move the data and control (block) stacks on the thread and remove frame objects. #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is definitely an interesting idea. I expect that this would result in significant churn in ceval.c but, assuming meaningful performance improvement and/or maintainability, that would be justifiable. Here are some questions:
Also, some bonus observations:
|
1 us seems a large number.
Presumably the variability in cleanup time is due to object finalizations
due to DECREF.
|
I think this is partly achievable for 3.10. See #43 for the more limited form. |
Here's a draft write up describing the activation record stack, without frame objects: Activation RecordsEach Python function activation has an activation record. LayoutEach activation record consists of four sections:
The linkage and specials sections are the same size for all frames, which means that a pointer to the specials, linkage, or base of the stack are equivalent, as each can be computed from the others by adding a small constant. There are two options for layout. Option A allows opverlapping frames and minimal movement Option A:
This layout requires three pointer to efficiently access:
Option B:
This layout only requires two pointers:
Computing the base of the evaluation stack becomes a bit more expensive, but not that much. Linkage sectionThe linkage section contains the above pointers for the caller's activation, A pointer to the specials/linkage of the current activation will be stored in the Note:
Specials sectionThe specials sections contains the following pointers:
Frame objectsTo enable introspection, a heap allocated This gives the appearance of persistent, heap-allocated activation records with minimal runtime overhead PEP 523 -- Adding a frame evaluation API to CPythonPEP 523 requires that a |
Why would you even need to know the base of the stack? Just to detect underflow? |
That should have been "evaluation stack" |
Done 🎉 |
Python, like almost all other languages, operates on a stack of call frames.
Most languages use a continuous stack for calls because it is much more efficient.
Obviously, C and all languages that are pre-compiled use the OS/hardware provided stack.
But even interpreted languages use a continuous stack. Java and Lua are obvious examples.
Python should do the same. Allocating frames on the heap, "zombie" frames, and the excessive copying of arguments are all slow and unnecessary.
Implementation
Each thread needs two stacks, a data stack and a control stack. These can be continuous or chunked. Chunked gives us most of the performance advantages of continuous and is more flexible, but is more complex to implement.
The data stack is a (conceptually) infinite stack of
PyObject *
s (orPyValue
s with tagging).The control stack is (conceptually) infinite stack of
ControlBlock
s.To efficiently implement overflow checks, stacks should be power-of-2 sized and aligned on their size.
For performance it is probably a good idea to arrange that the control stack cannot overflow unless the data stack does first. That way there is no need to check it for overflow. This can be done by choosing some ratio R and ensuring that for all code-objects
(locals + 3) >= (1+block_stack_size)*R
, which can be done by inflating the number of locals if necessary, and that the data stack has no more than R times as many entries as the control stack. 4 is probably a good value for R.ControlBlocks
A control block will be mostly one of:
Additional types of
ControlBlock
s can be used for transfer in and out of generators, exits from the interepreter at the C level, and other flow control tasks.Frame layout on data stack
Each frame consists of local variables (including cells) followed by
globals
,builtins
, andlocals
, followed by the evaluation stack.Python-to-python calls
Assuming that calls are effectively specialized (by #28) then making a call will require the following operations:
Generators and coroutines
Generators will need to contain space for their own local variables and control stack (for exceptions, not calls).
The compiler will need to be modified to:
Frame objects
Frame objects are widely used for debugging and introspection. So we must support them in a reasonably efficient fashion.
Upon making a call, we push a
ControlBlock
. This "call" block will contain a pointer (initially NULL) pointing to the frame object. Should we ever need a frame object, say fromsys._getframe()
, we lazily create one that points back to the control block for that frame. On exiting the function, the frame can be discarded or, if still live, the locals can be copied into the frame.Implementing
sys._getframe()
To find the
n
th frame we walk the control stack until we find then
th "call" block, then read the frame-object from that.If it is NULL, we create a new one and store it into the control block.
Example control blocks:
"Call" block:
"Generator" block:
"Try" block:
The text was updated successfully, but these errors were encountered: