-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-113743: Make the MRO cache thread-safe in free-threaded builds #113930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1704306
to
74d4985
Compare
0fb0c7d
to
018d0e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall approach looks good.
I think the seqlock code would be easier to understand if it's refactored out into its own type with associated functions (possibly in pycore_lock.h
).
5c016aa
to
aa7d30f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me other than a few minor issues.
As I commented below, I think it'd be better to use an unsigned type for the sequence counter since it can wrap around. (We compile with -fwrapv
, but still seems like good practice). Whatever type you use, the atomic operations should match. Currently, there's a mix of _int32
and _int
suffixed operations.
d65909f
to
23d2655
Compare
If we are going to do any sensible optimizations we need to assume that classes don't do weird things. Given that, I think it would make more sense to use stop-the-world to protect the cache. Or am I missing something? |
Changing something like There's also things like |
In addition to what Sam said which I think explains why we can't just use stop-the-world for the inputs to the cache, the cache itself certainly can't be protected by stop-the-world. Because the cache isn't a real dictionary it regularly mutates itself in the face of types that aren't changing. During startup alone there are around 1300-1400 hits where the cache will overwrite an already valid entry with a new entry. I'm also a not sure exactly what stop-the-world means in the face of assignment to bases as it has the potential to call back into Python code when the mro is calculated - so the world can't be entirely stopped. I'm also not sure we can even just allow the current thread to run and keep other threads stopped either as there could be user-space locks held (although maybe those will all be tied into the critical section system?). I think that means what we need to do is just stop the world before hand, clear all of the external caches, and then let things race again. And with the locks in place here they'll all block when they try and read from the types that are being updated. So it seems like even with a stop-the-world approach we still require the thread-safety within the type object to maintain internal consistency and to support stop-the-world. |
I think one thing that's missing here is |
Use atomic storage instead of exchange Use relaxed load when starting sequence lock for write Fix some formatting
c731ed6
to
66ce5f5
Compare
|
…ds (python#113930) Makes _PyType_Lookup thread safe, including: Thread safety of the underlying cache. Make mutation of mro and type members thread safe Also _PyType_GetMRO and _PyType_GetBases are currently returning borrowed references which aren't safe.
…ds (python#113930) Makes _PyType_Lookup thread safe, including: Thread safety of the underlying cache. Make mutation of mro and type members thread safe Also _PyType_GetMRO and _PyType_GetBases are currently returning borrowed references which aren't safe.
This makes _PyType_Lookup thread safe.
There's a couple of aspects to this:
For this a lock that is similar to Linux's seqlock is used. This allows the reader to read the sequence using acquire semantics which ensures it sees the values that were written by the last time the lock was released. It reads the sequence a 2nd time again with acquire semantics after having read the values and confirms the sequence has not changed, in which case the values read were valid.
On an update if we have multiple readers who missed but looked up the same value only the first one in will perform the update. It also appears the linux implementation (at least initially?) used a separate spin lock, we just use CAS on the sequence.
There's currently a sequence/lock per-cache entry, on 64-bit builds this is free due to the alignment of the existing cache entries.
mro
and type members thread safeBecause the cache is racing against mutations to the types those mutations also need to be thread safe. This involves adding critical sections around mutation of the type's dictionary (type_setattro), mutation of the mro, and mutation of the bases (which of course ultimately mutates the mro).
This bubbles up to a surprising number of functions as
lookup_tp_mro
and both are returning borrowed references which need to be protected in case the type is modified. In general I've used assertions to make sure the locks are already held by the time we reach the functions using them to push the locking into the chunkier functions.There is a single mutex which is used to protect all type objects. This is because sometimes we need to actually update the type hierarchy, and if were to recurse through sub-types and lock them individually we could end up releasing the outer type's lock and open it up to mutation in a way which is not currently possible.
Py_BEGIN_CRITICAL_SECTION
andPy_END_CRITICAL_SECTION
are a little inflexible in that you can't return inside of a critical section, so for more complicated functions I've added_unlocked
variants and wrapped those in the original name which acquires the lock.These unlocked versions are also used elsewhere when we know that we have the lock held. If we've already acquired the lock currently re-acquiring the lock immediately, even from the same thread, comes with some additional cost as we first spin and then need to go to the parking lot.
Also
_PyType_GetMRO
and_PyType_GetBases
are currently returning borrowed references which aren't safe. Luckily these functions aren't currently exported, so I've switched them to returning owned references._PyType_GetBases
actually had no consumers before this diff but now is used internally to typeobject.c._PyType_GetMRO
was only used in the abc module, and even then it's usage is actually just re-implementation ofPyType_IsSubtype
so I've replaced it with a call to that function.In the future we may be able to improve this. Currently much of this locking is done just to make sure we don't end up with a dangling reference to mro/bases. In some cases it may be okay to race against mutations and just have the program behave as if whatever check is applied to those happened before the mutation (e.g. consider the case of
PyType_IsSubtype
, whether the mutation happens right after we fetch MRO or right after we release doesn't really matter, as the type can change as soon as we've released the lock). Therefore we may be able to eliminate some of this locking by having a mutation of mro/bases scheduling the final dec ref to happen via qsbr when when hit the next quiescent state.