Skip to content

gh-118218: Reuse return tuple in itertools.pairwise #118219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 30, 2024

Conversation

hauntsaninja
Copy link
Contributor

@hauntsaninja hauntsaninja commented Apr 24, 2024

As per the comment, this is shamelessly based off of enumobject.c

With this change:

b1(1)            min=152.8us mean=155.5us ± 3.8us (25 repeats, 1000 loops)
b2(1)            min=149.0us mean=159.1us ± 8.8us (25 repeats, 1000 loops)
b3(1)            min=232.6us mean=242.5us ± 11.7us (25 repeats, 1000 loops)
b1(10)           min=279.2us mean=296.9us ± 16.6us (25 repeats, 1000 loops)
b2(10)           min=249.5us mean=259.2us ± 12.2us (25 repeats, 1000 loops)
b3(10)           min=386.6us mean=398.8us ± 10.1us (25 repeats, 1000 loops)
b1(1000)         min=20.3ms mean=20.7ms ± 0.5ms (25 repeats, 1000 loops)
b2(1000)         min=16.7ms mean=17.1ms ± 0.2ms (25 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.2ms ± 0.3ms (25 repeats, 1000 loops)

Without this change:

b1(1)            min=142.2us mean=143.0us ± 0.9us (25 repeats, 1000 loops)
b2(1)            min=142.7us mean=143.3us ± 1.0us (25 repeats, 1000 loops)
b3(1)            min=219.8us mean=227.2us ± 4.4us (25 repeats, 1000 loops)
b1(10)           min=314.2us mean=323.8us ± 4.1us (25 repeats, 1000 loops)
b2(10)           min=335.4us mean=341.8us ± 5.1us (25 repeats, 1000 loops)
b3(10)           min=362.0us mean=386.2us ± 14.9us (25 repeats, 1000 loops)
b1(1000)         min=26.5ms mean=27.3ms ± 0.3ms (25 repeats, 1000 loops)
b2(1000)         min=29.8ms mean=30.2ms ± 0.2ms (25 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.5ms ± 0.4ms (25 repeats, 1000 loops)

Benchmarking script:

import statistics
import time
import timeit
from itertools import pairwise

def b1(n):
    for x in pairwise(range(n)):
        pass

def b2(n):
    for a, b in pairwise(range(n)):
        pass

def b3(n):
    list(pairwise(range(n)))


def format_time(t):
    if t >= 1e9:
        return f'{t/1e9:.1f}s'
    if t >= 1e6:
        return f'{t/1e6:.1f}ms'
    if t >= 1e3:
        return f'{t/1e3:.1f}us'
    return f'{t:.1f}ns'


def format_mean_stdev(ts):
    mean = statistics.mean(ts)
    stdev = statistics.stdev(ts)
    if mean >= 1e9:
        return f'{mean/1e9:.1f}s ± {stdev/1e9:.1f}s'
    if mean >= 1e6:
        return f'{mean/1e6:.1f}ms ± {stdev/1e6:.1f}ms'
    if mean >= 1e3:
        return f'{mean/1e3:.1f}us ± {stdev/1e3:.1f}us'
    return f'{mean:.1f}ns ± {stdev:.1f}ns'


def bench(stmt):
    repeat = 25
    number = 1000
    timings = timeit.repeat(
        stmt, globals=globals(), repeat=repeat, number=number, timer=time.perf_counter_ns
    )
    print(
        f'{stmt:16} '
        f'min={format_time(min(timings))} '
        f'mean={format_mean_stdev(timings)} '
        f'({repeat} repeats, {number} loops)'
    )

bench('b1(1)')
bench('b2(1)')
bench('b3(1)')

bench('b1(10)')
bench('b2(10)')
bench('b3(10)')

bench('b1(1000)')
bench('b2(1000)')
bench('b3(1000)')

With this change:
```
b1(1)            min=152.8us mean=155.5us ± 3.8us (25 repeats, 1000 loops)
b2(1)            min=149.0us mean=159.1us ± 8.8us (25 repeats, 1000 loops)
b3(1)            min=232.6us mean=242.5us ± 11.7us (25 repeats, 1000 loops)
b1(10)           min=279.2us mean=296.9us ± 16.6us (25 repeats, 1000 loops)
b2(10)           min=249.5us mean=259.2us ± 12.2us (25 repeats, 1000 loops)
b3(10)           min=386.6us mean=398.8us ± 10.1us (25 repeats, 1000 loops)
b1(1000)         min=20.3ms mean=20.7ms ± 0.5ms (25 repeats, 1000 loops)
b2(1000)         min=16.7ms mean=17.1ms ± 0.2ms (25 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.2ms ± 0.3ms (25 repeats, 1000 loops)
```
Without this change:
```
b1(1)            min=142.2us mean=143.0us ± 0.9us (25 repeats, 1000 loops)
b2(1)            min=142.7us mean=143.3us ± 1.0us (25 repeats, 1000 loops)
b3(1)            min=219.8us mean=227.2us ± 4.4us (25 repeats, 1000 loops)
b1(10)           min=314.2us mean=323.8us ± 4.1us (25 repeats, 1000 loops)
b2(10)           min=335.4us mean=341.8us ± 5.1us (25 repeats, 1000 loops)
b3(10)           min=362.0us mean=386.2us ± 14.9us (25 repeats, 1000 loops)
b1(1000)         min=26.5ms mean=27.3ms ± 0.3ms (25 repeats, 1000 loops)
b2(1000)         min=29.8ms mean=30.2ms ± 0.2ms (25 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.5ms ± 0.4ms (25 repeats, 1000 loops)
```
@@ -301,6 +302,11 @@ pairwise_new_impl(PyTypeObject *type, PyObject *iterable)
}
po->it = it;
po->old = NULL;
po->result = PyTuple_Pack(2, Py_None, Py_None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTuple_Pack is slow. I created an issue #118222 to discuss this.

An alternative that might be faster:

Suggested change
po->result = PyTuple_Pack(2, Py_None, Py_None);
PyObject tmp[2] = {Py_None, Py_None};
po->result = _PyTuple_FromArraySteal(tmp, 2),;

(note: untested!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this locally. Since this line is once called once for each call to pairwise the performance difference is insignificant. It does matter for the other call to PyTuple_Pack (the one in pairwise_next), but that is best handled in a different PR.

Comment on lines +372 to +375
PyTuple_SET_ITEM(result, 0, Py_NewRef(old));
PyTuple_SET_ITEM(result, 1, Py_NewRef(new));
Py_DECREF(last_old);
Py_DECREF(last_new);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since last_new is equal to old we could do:

Suggested change
PyTuple_SET_ITEM(result, 0, Py_NewRef(old));
PyTuple_SET_ITEM(result, 1, Py_NewRef(new));
Py_DECREF(last_old);
Py_DECREF(last_new);
PyTuple_SET_ITEM(result, 0, old);
PyTuple_SET_ITEM(result, 1, Py_NewRef(new));
Py_DECREF(last_old);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily true that last_new is old. Really up to the whims of when the user lets go of the tuple

@rhettinger
Copy link
Contributor

Conceptually this seems reasonable to me. Please do have someone else give it a thorough review (Serhiy would be a reasonable choice).

@rhettinger rhettinger removed their request for review April 25, 2024 23:42
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please compare it also with the simple code that uses PyTuple_New + PyTuple_SET_ITEM instead of PyTuple_Pack if the latter is so slow.

@hauntsaninja
Copy link
Contributor Author

hauntsaninja commented Apr 28, 2024

PyTuple_New + PyTuple_SET_ITEM (maybe 1.05x faster than baseline):

b1(1000)         min=27.0ms mean=27.3ms ± 0.1ms (50 repeats, 1000 loops)
b2(1000)         min=28.4ms mean=28.5ms ± 0.2ms (50 repeats, 1000 loops)
b3(1000)         min=25.2ms mean=25.7ms ± 0.3ms (50 repeats, 1000 loops)

Using this diff:

diff --git a/Modules/itertoolsmodule.c b/Modules/itertoolsmodule.c
index 6ee447ef6a..668c5a4f15 100644
--- a/Modules/itertoolsmodule.c
+++ b/Modules/itertoolsmodule.c
@@ -356,7 +356,11 @@ pairwise_next(pairwiseobject *po)
         return NULL;
     }
     /* Future optimization: Reuse the result tuple as we do in enumerate() */
-    result = PyTuple_Pack(2, old, new);
+    result = PyTuple_New(2);
+    if (result != NULL) {
+        PyTuple_SET_ITEM(result, 0, Py_NewRef(old));
+        PyTuple_SET_ITEM(result, 1, Py_NewRef(new));
+    }
     Py_XSETREF(po->old, new);
     Py_DECREF(old);
     return result;

Baseline:

b1(1000)         min=26.9ms mean=27.5ms ± 0.2ms (50 repeats, 1000 loops)
b2(1000)         min=29.5ms mean=29.7ms ± 0.3ms (50 repeats, 1000 loops)
b3(1000)         min=26.4ms mean=26.8ms ± 0.2ms (50 repeats, 1000 loops)

This PR:

b1(1000)         min=20.2ms mean=20.4ms ± 0.2ms (50 repeats, 1000 loops)
b2(1000)         min=16.9ms mean=17.0ms ± 0.0ms (50 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.1ms ± 0.2ms (50 repeats, 1000 loops)

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines +1827 to +1828
it = pairwise([None, None])
gc.collect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all tests above except for zip_longest there is a next(it) before gc.collect(). All these tests were added in 226a012 (bpo-42536).

@brandtbucher, why is such difference? Is there a bug in test_zip_longest_result_gc? Do we need next(it) there and here?

@serhiy-storchaka serhiy-storchaka merged commit 6999d68 into python:main Apr 30, 2024
36 checks passed
@serhiy-storchaka
Copy link
Member

Thank you for your contribution @hauntsaninja.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants