gh-118218: Reuse return tuple in itertools.pairwise #118219

hauntsaninja · 2024-04-24T07:40:34Z

As per the comment, this is shamelessly based off of enumobject.c

With this change:

b1(1)            min=152.8us mean=155.5us ± 3.8us (25 repeats, 1000 loops)
b2(1)            min=149.0us mean=159.1us ± 8.8us (25 repeats, 1000 loops)
b3(1)            min=232.6us mean=242.5us ± 11.7us (25 repeats, 1000 loops)
b1(10)           min=279.2us mean=296.9us ± 16.6us (25 repeats, 1000 loops)
b2(10)           min=249.5us mean=259.2us ± 12.2us (25 repeats, 1000 loops)
b3(10)           min=386.6us mean=398.8us ± 10.1us (25 repeats, 1000 loops)
b1(1000)         min=20.3ms mean=20.7ms ± 0.5ms (25 repeats, 1000 loops)
b2(1000)         min=16.7ms mean=17.1ms ± 0.2ms (25 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.2ms ± 0.3ms (25 repeats, 1000 loops)

Without this change:

b1(1)            min=142.2us mean=143.0us ± 0.9us (25 repeats, 1000 loops)
b2(1)            min=142.7us mean=143.3us ± 1.0us (25 repeats, 1000 loops)
b3(1)            min=219.8us mean=227.2us ± 4.4us (25 repeats, 1000 loops)
b1(10)           min=314.2us mean=323.8us ± 4.1us (25 repeats, 1000 loops)
b2(10)           min=335.4us mean=341.8us ± 5.1us (25 repeats, 1000 loops)
b3(10)           min=362.0us mean=386.2us ± 14.9us (25 repeats, 1000 loops)
b1(1000)         min=26.5ms mean=27.3ms ± 0.3ms (25 repeats, 1000 loops)
b2(1000)         min=29.8ms mean=30.2ms ± 0.2ms (25 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.5ms ± 0.4ms (25 repeats, 1000 loops)

Benchmarking script:

import statistics
import time
import timeit
from itertools import pairwise

def b1(n):
    for x in pairwise(range(n)):
        pass

def b2(n):
    for a, b in pairwise(range(n)):
        pass

def b3(n):
    list(pairwise(range(n)))


def format_time(t):
    if t >= 1e9:
        return f'{t/1e9:.1f}s'
    if t >= 1e6:
        return f'{t/1e6:.1f}ms'
    if t >= 1e3:
        return f'{t/1e3:.1f}us'
    return f'{t:.1f}ns'


def format_mean_stdev(ts):
    mean = statistics.mean(ts)
    stdev = statistics.stdev(ts)
    if mean >= 1e9:
        return f'{mean/1e9:.1f}s ± {stdev/1e9:.1f}s'
    if mean >= 1e6:
        return f'{mean/1e6:.1f}ms ± {stdev/1e6:.1f}ms'
    if mean >= 1e3:
        return f'{mean/1e3:.1f}us ± {stdev/1e3:.1f}us'
    return f'{mean:.1f}ns ± {stdev:.1f}ns'


def bench(stmt):
    repeat = 25
    number = 1000
    timings = timeit.repeat(
        stmt, globals=globals(), repeat=repeat, number=number, timer=time.perf_counter_ns
    )
    print(
        f'{stmt:16} '
        f'min={format_time(min(timings))} '
        f'mean={format_mean_stdev(timings)} '
        f'({repeat} repeats, {number} loops)'
    )

bench('b1(1)')
bench('b2(1)')
bench('b3(1)')

bench('b1(10)')
bench('b2(10)')
bench('b3(10)')

bench('b1(1000)')
bench('b2(1000)')
bench('b3(1000)')

Issue: Speed up itertools.pairwise #118218

With this change: ``` b1(1) min=152.8us mean=155.5us ± 3.8us (25 repeats, 1000 loops) b2(1) min=149.0us mean=159.1us ± 8.8us (25 repeats, 1000 loops) b3(1) min=232.6us mean=242.5us ± 11.7us (25 repeats, 1000 loops) b1(10) min=279.2us mean=296.9us ± 16.6us (25 repeats, 1000 loops) b2(10) min=249.5us mean=259.2us ± 12.2us (25 repeats, 1000 loops) b3(10) min=386.6us mean=398.8us ± 10.1us (25 repeats, 1000 loops) b1(1000) min=20.3ms mean=20.7ms ± 0.5ms (25 repeats, 1000 loops) b2(1000) min=16.7ms mean=17.1ms ± 0.2ms (25 repeats, 1000 loops) b3(1000) min=26.0ms mean=26.2ms ± 0.3ms (25 repeats, 1000 loops) ``` Without this change: ``` b1(1) min=142.2us mean=143.0us ± 0.9us (25 repeats, 1000 loops) b2(1) min=142.7us mean=143.3us ± 1.0us (25 repeats, 1000 loops) b3(1) min=219.8us mean=227.2us ± 4.4us (25 repeats, 1000 loops) b1(10) min=314.2us mean=323.8us ± 4.1us (25 repeats, 1000 loops) b2(10) min=335.4us mean=341.8us ± 5.1us (25 repeats, 1000 loops) b3(10) min=362.0us mean=386.2us ± 14.9us (25 repeats, 1000 loops) b1(1000) min=26.5ms mean=27.3ms ± 0.3ms (25 repeats, 1000 loops) b2(1000) min=29.8ms mean=30.2ms ± 0.2ms (25 repeats, 1000 loops) b3(1000) min=26.0ms mean=26.5ms ± 0.4ms (25 repeats, 1000 loops) ```

eendebakpt · 2024-04-24T09:38:36Z

Modules/itertoolsmodule.c

@@ -301,6 +302,11 @@ pairwise_new_impl(PyTypeObject *type, PyObject *iterable)
    }
    po->it = it;
    po->old = NULL;
+    po->result = PyTuple_Pack(2, Py_None, Py_None);


PyTuple_Pack is slow. I created an issue #118222 to discuss this.

An alternative that might be faster:

Suggested change

po->result = PyTuple_Pack(2, Py_None, Py_None);

PyObject tmp[2] = {Py_None, Py_None};

po->result = _PyTuple_FromArraySteal(tmp, 2),;

(note: untested!)

I tested this locally. Since this line is once called once for each call to pairwise the performance difference is insignificant. It does matter for the other call to PyTuple_Pack (the one in pairwise_next), but that is best handled in a different PR.

eendebakpt · 2024-04-24T09:44:18Z

Modules/itertoolsmodule.c

+        PyTuple_SET_ITEM(result, 0, Py_NewRef(old));
+        PyTuple_SET_ITEM(result, 1, Py_NewRef(new));
+        Py_DECREF(last_old);
+        Py_DECREF(last_new);


Since last_new is equal to old we could do:

Suggested change

PyTuple_SET_ITEM(result, 0, Py_NewRef(old));

PyTuple_SET_ITEM(result, 1, Py_NewRef(new));

Py_DECREF(last_old);

Py_DECREF(last_new);

PyTuple_SET_ITEM(result, 0, old);

PyTuple_SET_ITEM(result, 1, Py_NewRef(new));

Py_DECREF(last_old);

It's not necessarily true that last_new is old. Really up to the whims of when the user lets go of the tuple

rhettinger · 2024-04-25T23:42:14Z

Conceptually this seems reasonable to me. Please do have someone else give it a thorough review (Serhiy would be a reasonable choice).

serhiy-storchaka

Please compare it also with the simple code that uses PyTuple_New + PyTuple_SET_ITEM instead of PyTuple_Pack if the latter is so slow.

Modules/itertoolsmodule.c

hauntsaninja · 2024-04-28T21:07:51Z

PyTuple_New + PyTuple_SET_ITEM (maybe 1.05x faster than baseline):

b1(1000)         min=27.0ms mean=27.3ms ± 0.1ms (50 repeats, 1000 loops)
b2(1000)         min=28.4ms mean=28.5ms ± 0.2ms (50 repeats, 1000 loops)
b3(1000)         min=25.2ms mean=25.7ms ± 0.3ms (50 repeats, 1000 loops)

Using this diff:

diff --git a/Modules/itertoolsmodule.c b/Modules/itertoolsmodule.c
index 6ee447ef6a..668c5a4f15 100644
--- a/Modules/itertoolsmodule.c
+++ b/Modules/itertoolsmodule.c
@@ -356,7 +356,11 @@ pairwise_next(pairwiseobject *po)
         return NULL;
     }
     /* Future optimization: Reuse the result tuple as we do in enumerate() */
-    result = PyTuple_Pack(2, old, new);
+    result = PyTuple_New(2);
+    if (result != NULL) {
+        PyTuple_SET_ITEM(result, 0, Py_NewRef(old));
+        PyTuple_SET_ITEM(result, 1, Py_NewRef(new));
+    }
     Py_XSETREF(po->old, new);
     Py_DECREF(old);
     return result;

Baseline:

b1(1000)         min=26.9ms mean=27.5ms ± 0.2ms (50 repeats, 1000 loops)
b2(1000)         min=29.5ms mean=29.7ms ± 0.3ms (50 repeats, 1000 loops)
b3(1000)         min=26.4ms mean=26.8ms ± 0.2ms (50 repeats, 1000 loops)

This PR:

b1(1000)         min=20.2ms mean=20.4ms ± 0.2ms (50 repeats, 1000 loops)
b2(1000)         min=16.9ms mean=17.0ms ± 0.0ms (50 repeats, 1000 loops)
b3(1000)         min=26.0ms mean=26.1ms ± 0.2ms (50 repeats, 1000 loops)

serhiy-storchaka

LGTM.

serhiy-storchaka · 2024-04-29T07:04:53Z

Lib/test/test_itertools.py

+        it = pairwise([None, None])
+        gc.collect()


In all tests above except for zip_longest there is a next(it) before gc.collect(). All these tests were added in 226a012 (bpo-42536).

@brandtbucher, why is such difference? Is there a bug in test_zip_longest_result_gc? Do we need next(it) there and here?

Misc/NEWS.d/next/Library/2024-04-24-07-45-08.gh-issue-118218.m1OHbN.rst

Modules/itertoolsmodule.c

Misc/NEWS.d/next/Library/2024-04-24-07-45-08.gh-issue-118218.m1OHbN.rst

…1OHbN.rst Co-authored-by: Serhiy Storchaka <[email protected]>

serhiy-storchaka · 2024-04-30T20:17:25Z

Thank you for your contribution @hauntsaninja.

…18219)

hauntsaninja requested a review from rhettinger as a code owner April 24, 2024 07:40

bedevere-app bot mentioned this pull request Apr 24, 2024

Speed up itertools.pairwise #118218

Closed

bedevere-app bot added the awaiting core review label Apr 24, 2024

blurb-it bot and others added 2 commits April 24, 2024 07:45

📜🤖 Added by blurb_it.

f653f22

add bpo-42536 style test for gc tracking

ea0cfd6

eendebakpt reviewed Apr 24, 2024

View reviewed changes

rhettinger removed their request for review April 25, 2024 23:42

hauntsaninja requested a review from serhiy-storchaka April 26, 2024 01:15

serhiy-storchaka reviewed Apr 26, 2024

View reviewed changes

Modules/itertoolsmodule.c Outdated Show resolved Hide resolved

Modules/itertoolsmodule.c Show resolved Hide resolved

nit

68b5d27

hauntsaninja force-pushed the pairwise-faster branch from 11f2e8e to 68b5d27 Compare April 28, 2024 20:30

serhiy-storchaka approved these changes Apr 29, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Apr 29, 2024

serhiy-storchaka reviewed Apr 30, 2024

View reviewed changes

Misc/NEWS.d/next/Library/2024-04-24-07-45-08.gh-issue-118218.m1OHbN.rst Outdated Show resolved Hide resolved

Update Misc/NEWS.d/next/Library/2024-04-24-07-45-08.gh-issue-118218.m…

1bfddd6

…1OHbN.rst Co-authored-by: Serhiy Storchaka <[email protected]>

serhiy-storchaka merged commit 6999d68 into python:main Apr 30, 2024
36 checks passed

bedevere-app bot removed the awaiting merge label Apr 30, 2024

hauntsaninja deleted the pairwise-faster branch May 1, 2024 00:42

eendebakpt mentioned this pull request May 7, 2024

Add Py_TuplePack2 and Py_TuplePack1 #118222

Closed

SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024

pythongh-118218: Reuse return tuple in itertools.pairwise (pythonGH-1…

ca6a36f

…18219)

eendebakpt mentioned this pull request Jul 7, 2024

enum_next and pairwise_next can result in tuple elements with zero reference count in free-threading build #121464

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-118218: Reuse return tuple in itertools.pairwise #118219

gh-118218: Reuse return tuple in itertools.pairwise #118219

hauntsaninja commented Apr 24, 2024 •

edited

Loading

eendebakpt Apr 24, 2024

eendebakpt Apr 26, 2024

eendebakpt Apr 24, 2024

hauntsaninja Apr 26, 2024

rhettinger commented Apr 25, 2024

serhiy-storchaka left a comment

hauntsaninja commented Apr 28, 2024 •

edited

Loading

serhiy-storchaka left a comment

serhiy-storchaka Apr 29, 2024

serhiy-storchaka commented Apr 30, 2024

	po->result = PyTuple_Pack(2, Py_None, Py_None);
	PyObject tmp[2] = {Py_None, Py_None};
	po->result = _PyTuple_FromArraySteal(tmp, 2),;

gh-118218: Reuse return tuple in itertools.pairwise #118219

gh-118218: Reuse return tuple in itertools.pairwise #118219

Conversation

hauntsaninja commented Apr 24, 2024 • edited Loading

eendebakpt Apr 24, 2024

Choose a reason for hiding this comment

eendebakpt Apr 26, 2024

Choose a reason for hiding this comment

eendebakpt Apr 24, 2024

Choose a reason for hiding this comment

hauntsaninja Apr 26, 2024

Choose a reason for hiding this comment

rhettinger commented Apr 25, 2024

serhiy-storchaka left a comment

Choose a reason for hiding this comment

hauntsaninja commented Apr 28, 2024 • edited Loading

serhiy-storchaka left a comment

Choose a reason for hiding this comment

serhiy-storchaka Apr 29, 2024

Choose a reason for hiding this comment

serhiy-storchaka commented Apr 30, 2024

hauntsaninja commented Apr 24, 2024 •

edited

Loading

hauntsaninja commented Apr 28, 2024 •

edited

Loading