-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #123848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can we talk about this at the sprint? I would like to have a sound overall strategy for how all of these should be approached (what guarantees can be made, what is most useful, decide whether to add locks, redesign from scratch or just provide an alternative code path). |
There is way too much "brain surgery" going on here. For me, it will make it hard to maintain this code going forward. I'm going to close this PR for now but leave the related issue open until we've decided on the cleanest possible approach. |
@@ -902,35 +902,11 @@ def __next__(self): | |||
(([2], [3]), [4]), | |||
([4], [5]), | |||
]) | |||
check({2}, [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are passed for equivalent Python implementation. So it is reasonable to expect them passing for any correct C implementation.
Originally the purpose of these tests was to test bugs with using borrowed references. After removing them we cannot be sure that the bugs will not return. If you need to remove them, then perhaps the bugs returned.
The pairwise iterator is not safe to use under free threading. In this PR we make concurrent iteration safe with the following in mind:
pairwise(range(100))
results in a tuple(10, 12)
.pairwise(it)
no elements are read from the iteratorit
StopIteration
because the iterator is exhausted, all subsequent calls returnStopIteration
as well (even if new elements are added to the iterator)We use the method
_PyObject_IsUniquelyReferenced(result)
to determine whether we can re-use the internal result tuple.There needs to be a way for the signal the iterator has been exhausted. The approach in the current main branch for the pairwise object
po
is to setpo->it
to null and to decrement the reference to that object. That is not possible under concurrent iteration, since concurrent threads use borrowed referencespo->it
. Some solutions:i) Add a lock for the pairwise object for the call to next. This works, is simple, but has a strong negative impact on single-threaded performance
ii) Add a new variable to the pairwise object to signal the po->it as been exhausted and do not clear po->it. This increases memory usage of the pairwise object with a single int (currently it used 3 pointers + size of a python object)
iii) Set the
po->result
to NULL as a signal and do not clearpo->it
. Works a bit better than settingpo->it
to zero (because only a single thread will be usingpo->result
), but can still go wrongiv) Do not use borrowed references to
po->it
. This could work, but requires an incref / decref for every call to pairwise.In this PR we pick option ii) from the list above. i) and iii) are no options. We prefer a bit of extra memory over the cost over an incref/decref for every iteration.
In the first call to
pairwise_next
thepo->old
is initialized. This should be safe for concurrent iteration, but also for a recursive call topairwise_next
(calling tp_iternext onpo->it
can invoke recursive calls)Updating po->old should be atomic. We use
_Py_atomic_exchange_ptr
to do this, but the reference countingand making sure recursive calls and handled is tricky.
Notes:
the unit test added does trigger problems with the current pairwise implementation under free threading, but the settings for the number of iterations (set so the test takes < 0.1 seconds) makes the probability of triggering an issue low. For local testing I would suggest to increase number_of_iterations to 5000.
We disable some tests added by @serhiy-storchaka in Re-entering pairwise.__next__() leaks references #109786. Do not merge this PR until it has been confirmed this behavior change is acceptable.