-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Add a PyThreadState *
parameter (almost) everywhere
#132312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I trust that as part of this work we'd also excise "borrowed references" from the API. It'd be slightly perverse to add a thread state data structure to the parameters and preserve something so un-thread-safe. As long as you've got the eggs out, might as well make a really big omelete. |
The thread state is currently heavily loaded by:
I hope their overhead will be reduced first. |
Can you elaborate on what "heavily loaded" means? |
I'm happy to take part in this effort. I would appreciate some mentoring since I've never contributed to CPython before. |
The following are the frequencies of loading the tstate I observed half a year ago on main. Running 44 PGO test suites:
Running 47 pyperformance benchmarks:
IIUC, When running only main-interpreter, |
We won't be changing any functions, just adding new ones. No new functions should return a borrowed reference. Hopefully the old functions that return borrowed references will just fall out of use and can be removed. |
So far, no calling convention for Python C API function pass tstate to the function. I'm talking about METH_O, METH_VARARGS, METH_FASTCALL, etc. If we need to pass tstate everywhere, maybe we should consider adding new calling conventions passing tstate as well.
Right, the C API is big, it has now more than 1,000 public functions. Would you mind to elaborate which functions should be added first to get a new tstate parameter?
The proposed new C API sounds like an optimization. Do you have an idea on the impact on performance of passing tstate? For example, do you expect code modified to use to run 10% faster? |
Probably not, but we do have an idea of the impact of reading it from reliable and portable thread-local storage - it's high. The more we can avoid using TLS/TSS for things that can be passed directly (on the stack/in registers), the better. |
Passing tstate as a parameter may can also consume one register in the calling convention ABI, whereas it may be unused in some cases. |
Which is unpredictable enough that no benchmark is going to tell you the impact anyway. The compiler might inline it and allocate registers around it, which makes it free, or it might cause a spill to stack in some cases, which could be highly impactful in a micro sense, but likely still irrelevant compared to the TLS read. Mark's team (and others) are watching the impact on the innermost loops really closely, so I'd expect he'll be one of the first to pull it out if it's massively degrading. |
If we add a whole new API layer and ask everyone to rewrite their extensions to get best speed, should we also switch all This would of course also mean implementations with a moving GC could start providing CPython-compatible C API. |
I don't think we're doing that yet, but if this "if" turns out to be true, then yes. I see this more as a change to internal APIs than anything like deprecating existing public APIs (neither of which you suggested, but on that spectrum "ask everyone to rewrite..." is much nearer one end than this proposal). Changing the public API to always take thread state is pointless if the internal APIs don't. So I'd say that new public APIs get reviewed as usual, and this proposal doesn't override that. (FTR, I think every other word in your original sentence also deserves discussion, but that can happen once we agree to invent a new public API.) |
This is part feature request, part performance issue and part a general appeal for assistance.
We should add a new variant taking a
PyThreadState *
parameter for most C API and internal functions.There are two motivations for this, performance and a future, consistent C API.
Performance
The
PyThreadState
struct is ubiquitous in the VM, it controls stack usage, holds the freelists, holds the current exception, etc, etc.Consequently many C functions, both API and internal, take a
PyThreadState *tstate
parameter.However, for historical reasons, many C functions, both API and internal, do not such a parameter.
This leads to some fairly easy to fix inefficiencies, where
spam()
andeggs()
take a thread state, butham()
does not, thenspam()
callsham()
which callseggs()
, forcingham()
to load the thread state from thread local storage, in order to pass it toeggs()
.Adding a
PyThreadState *tstate
toham()
avoids need to access thread local storage.Consistent, portable, future looking C API
In order to support things like tagged integers, we are going to need to a new C API.
It is out of scope to discuss what such an API would look like, but all realistic proposal so far take a "context" parameter to all, or almost all, API functions. Using a
PyThreadState *
parameter everywhere would make a mechanical transformation from old to new API much simpler.Unfortunately the C API is large and cannot be changed, so we will need many new functions, like
ham_tstate()
which replicatesham()
but with a thread state parameter.This work is largely mechanical, and can be done by inexperienced contributors. Hence the appeal for assistance.
The text was updated successfully, but these errors were encountered: