gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #123848

eendebakpt · 2024-09-08T20:41:02Z

The pairwise iterator is not safe to use under free threading. In this PR we make concurrent iteration safe with the following in mind:

Concurrent iteration over pairwise should not crash the interpreter. But "correct" results are not guaranteed. In particular under concurrent iteration it can occur that calling next on pairwise(range(100)) results in a tuple (10, 12).
The performance of iteration with a single-thread should not be affected (as much as possible)
Behavior of pairwise:
- Before the first invocation of next on pairwise(it) no elements are read from the iterator it
- If a call to next returns StopIteration because the iterator is exhausted, all subsequent calls return StopIteration as well (even if new elements are added to the iterator)
- Behavior for recursive calls to pairwise is hard to get right, but perhaps we can change the behavior.

We use the method _PyObject_IsUniquelyReferenced(result) to determine whether we can re-use the internal result tuple.
There needs to be a way for the signal the iterator has been exhausted. The approach in the current main branch for the pairwise object po is to set po->it to null and to decrement the reference to that object. That is not possible under concurrent iteration, since concurrent threads use borrowed references po->it. Some solutions:

i) Add a lock for the pairwise object for the call to next. This works, is simple, but has a strong negative impact on single-threaded performance
ii) Add a new variable to the pairwise object to signal the po->it as been exhausted and do not clear po->it. This increases memory usage of the pairwise object with a single int (currently it used 3 pointers + size of a python object)
iii) Set the po->result to NULL as a signal and do not clear po->it. Works a bit better than setting po->it to zero (because only a single thread will be using po->result), but can still go wrong
iv) Do not use borrowed references to po->it. This could work, but requires an incref / decref for every call to pairwise.

In this PR we pick option ii) from the list above. i) and iii) are no options. We prefer a bit of extra memory over the cost over an incref/decref for every iteration.

In the first call to pairwise_next the po->old is initialized. This should be safe for concurrent iteration, but also for a recursive call to pairwise_next (calling tp_iternext on po->it can invoke recursive calls)
Updating po->old should be atomic. We use _Py_atomic_exchange_ptr to do this, but the reference counting
and making sure recursive calls and handled is tricky.

Notes:

the unit test added does trigger problems with the current pairwise implementation under free threading, but the settings for the number of iterations (set so the test takes < 0.1 seconds) makes the probability of triggering an issue low. For local testing I would suggest to increase number_of_iterations to 5000.
We disable some tests added by @serhiy-storchaka in Re-entering pairwise.__next__() leaks references #109786. Do not merge this PR until it has been confirmed this behavior change is acceptable.

Issue: Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

…airwise_ft_v2

rhettinger · 2024-09-09T18:55:41Z

Can we talk about this at the sprint? I would like to have a sound overall strategy for how all of these should be approached (what guarantees can be made, what is most useful, decide whether to add locks, redesign from scratch or just provide an alternative code path).

rhettinger · 2024-10-10T18:50:34Z

There is way too much "brain surgery" going on here. For me, it will make it hard to maintain this code going forward. I'm going to close this PR for now but leave the related issue open until we've decided on the cleanest possible approach.

serhiy-storchaka · 2024-10-11T05:36:42Z

Lib/test/test_itertools.py

@@ -902,35 +902,11 @@ def __next__(self):
            (([2], [3]), [4]),
            ([4], [5]),
        ])
-        check({2}, [


These tests are passed for equivalent Python implementation. So it is reasonable to expect them passing for any correct C implementation.

Originally the purpose of these tests was to test bugs with using borrowed references. After removing them we cannot be sure that the bugs will not return. If you need to remove them, then perhaps the bugs returned.

eendebakpt added 3 commits September 7, 2024 23:20

Make concurrent iteration over pairwise safe under free-threading

a50c07b

make setting po->old atomic

eed6486

atomic exchanges

8493433

eendebakpt requested a review from rhettinger as a code owner September 8, 2024 20:41

bedevere-app bot added the awaiting review label Sep 8, 2024

blurb-it bot and others added 4 commits September 8, 2024 20:59

📜🤖 Added by blurb_it.

00f3e34

add missing incref/decref pair

f93d93a

Merge branch 'pairwise_ft_v2' of github.com:eendebakpt/cpython into p…

ae9ed99

…airwise_ft_v2

fix whatsnew; optimize incref on new

82e0489

eendebakpt changed the title ~~Draft: gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading~~ gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading Sep 8, 2024

add missing tests

8e4d77e

rhettinger self-assigned this Sep 9, 2024

rhettinger added the DO-NOT-MERGE label Sep 9, 2024

rhettinger closed this Oct 10, 2024

eendebakpt mentioned this pull request Oct 10, 2024

enum_next and pairwise_next can result in tuple elements with zero reference count in free-threading build #121464

Open

bedevere-app bot mentioned this pull request Sep 9, 2024

Make concurrent iteration over pairwise, combinations, permutations, cwr, product, etc. from itertools safe under free-threading #123471

Open

serhiy-storchaka reviewed Oct 11, 2024

View reviewed changes

eendebakpt mentioned this pull request Oct 13, 2024

Draft: gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #125417

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #123848

gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #123848

eendebakpt commented Sep 8, 2024 •

edited

Loading

rhettinger commented Sep 9, 2024

rhettinger commented Oct 10, 2024

serhiy-storchaka Oct 11, 2024

gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #123848

gh-123471: Make concurrent iteration over itertools.pairwise safe under free-threading #123848

Conversation

eendebakpt commented Sep 8, 2024 • edited Loading

rhettinger commented Sep 9, 2024

rhettinger commented Oct 10, 2024

serhiy-storchaka Oct 11, 2024

Choose a reason for hiding this comment

eendebakpt commented Sep 8, 2024 •

edited

Loading