gh-118702: Implement vectorcall for BaseException #118703

vstinner · 2024-05-07T12:02:02Z

BaseException_vectorcall() now creates a tuple from 'args' array.
Creation an exception using BaseException_vectorcall() is now a single function call, rather than having to call BaseException_new() and then BaseException_init(). Calling BaseException_init() is inefficient since it overrides the 'args' attribute.
_PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the KeyError instance to use BaseException_vectorcall().

Micro-benchmark on creating a KeyError on accessing a non-existent dictionary key:

Mean +- std dev: 447 ns +- 31 ns -> 373 ns +- 15 ns: 1.20x faster

Issue: Implement vectorcall for BaseException to optimize creation of exception instances #118702

vstinner · 2024-05-07T12:02:23Z

Benchmark:

import pyperf

EMPTY_DICT = {}
INNER_LOOPS = 1024
KEYS = tuple(range(INNER_LOOPS))

def bench_keyerror():
    d = EMPTY_DICT
    for key in KEYS:
        try:
            d[key]
        except KeyError:
            pass

runner = pyperf.Runner()
runner.bench_func('keyerror', bench_keyerror, inner_loops=INNER_LOOPS)

vstinner · 2024-05-07T12:03:50Z

cc @erlend-aasland @corona10 @serhiy-storchaka

Objects/exceptions.c

erlend-aasland · 2024-05-07T12:33:38Z

Nice change, but you broke CI :)

vstinner · 2024-05-07T13:58:12Z

Nice change, but you broke CI :)

Right, FAIL: testKeywordArgs (test.test_exceptions.ExceptionTests.testKeywordArgs).

It should now be fixed.

serhiy-storchaka

I am surprised that switching to vector call makes a difference here. How is it in comparison with the following BaseException_vectorcall implementation?

argstuple = _PyTuple_FromArray(args, PyVectorcall_NARGS(nargsf));
self = type_obj->tp_new(type_obj, argstuple, NULL);
Py_TYPE(self)->tp_init(self, argstuple, NULL);
Py_DECREF(argstuple);

If it is still faster than the current code, then there is something wrong in the non-vectorcall path.

eendebakpt · 2024-05-07T15:39:29Z

I am surprised that switching to vector call makes a difference here. How is it in comparison with the following BaseException_vectorcall implementation?
argstuple = _PyTuple_FromArray(args, PyVectorcall_NARGS(nargsf));
self = type_obj->tp_new(type_obj, argstuple, NULL);
Py_TYPE(self)->tp_init(self, argstuple, NULL);
Py_DECREF(argstuple);
If it is still faster than the current code, then there is something wrong in the non-vectorcall path.

In current main there is a call to PyTuple_Pack that is not in the vectorcall path. I did not benchmark this, but I suspect that could explain part of the difference

vstinner · 2024-05-07T15:50:44Z

If it is still faster than the current code, then there is something wrong in the non-vectorcall path.

This change avoids calling type_call() which contains more code.

vstinner · 2024-05-07T16:02:36Z

Benchmark result with CPU isolation (this PR):

Mean +- std dev: [ref] 264 ns +- 1 ns -> [baseexc] 227 ns +- 1 ns: 1.16x faster

I also measured only the Python/errors.c changes; replace PyTuple_Pack()+PyObject_Call() with PyObject_CallOneArg():

Mean +- std dev: [ref] 264 ns +- 1 ns -> [optim_keyerror] 248 ns +- 2 ns: 1.07x faster

It's faster, but not as fast as this PR.

serhiy-storchaka · 2024-05-07T16:41:26Z

This change avoids calling type_call() which contains more code.

But was is wrong with type_call()? On what does it spent time besides calling tp_new(), tp_init() and few trivial checks that it is so slower? Or maybe the difference is just the cost of these trivial checks?

erlend-aasland · 2024-05-07T18:43:34Z

But was is wrong with type_call()? On what does it spent time besides calling tp_new(), tp_init() and few trivial checks that it is so slower? Or maybe the difference is just the cost of these trivial checks?

AFAICS, in _PyObject_VectorcallTstate, a vectorcall function is called directly, avoiding the overhead of _PyObject_MakeTpCall and type_call.

serhiy-storchaka · 2024-05-08T07:12:23Z

Python/errors.c

+    PyObject *exc = PyObject_CallOneArg(PyExc_KeyError, arg);
+    if (!exc) {
        /* caller will expect error to be set anyway */
        return;
    }
-    _PyErr_SetObject(tstate, PyExc_KeyError, tup);
-    Py_DECREF(tup);
+
+    _PyErr_SetRaisedException(tstate, exc);


It is not the same. It does not set __context__, and there may be other differences in case when it fails to create an exception object.

Oh, right. I fixed this regression and added an unit test.

serhiy-storchaka · 2024-05-08T07:26:39Z

The half of the gain is due to the change in _PyErr_SetKeyError() which is not equivalent to the current code, so it perhaps should be reverted. Addition of BaseException_vectorcall is still beneficial right now, but optimizing _PyObject_MakeTpCall and type_call may reduce the gain even more. And it does not beneficial for exceptions with __new__ or __init__. In the end, it may make the benefit/cost ratio pretty low.

I do not actively oppose this change, I only suggest that there may be place for more general optimization. If you still want to add BaseException_vectorcall, with or without optimizing the other path, you are free to do this.

* BaseException_vectorcall() now creates a tuple from 'args' array. * Creation an exception using BaseException_vectorcall() is now a single function call, rather than having to call BaseException_new() and then BaseException_init(). Calling BaseException_init() is inefficient since it overrides the 'args' attribute. * _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the KeyError instance to use BaseException_vectorcall(). Micro-benchmark on creating a KeyError on accessing a non-existent dictionary key: Mean +- std dev: 447 ns +- 31 ns -> 373 ns +- 15 ns: 1.20x faster

vstinner · 2024-05-10T09:55:01Z

The PR is now ready for a review. The main branch is now Python 3.14.

I rebased the PR on the main branch and squashed commit. I fixed the __context__ regression.

Updated benchmark, Python built with gcc -O3, with CPU isolation:

key_error: Mean +- std dev: [ref] 267 ns +- 1 ns -> [change] 243 ns +- 1 ns: 1.10x faster
value_error: Mean +- std dev: [ref] 288 ns +- 2 ns -> [change] 258 ns +- 2 ns: 1.12x faster

I added a benchmark on raise ValueError.

+----------------+--------+----------------------+
| Benchmark      | ref    | change               |
+================+========+======================+
| key_error      | 267 ns | 243 ns: 1.10x faster |
+----------------+--------+----------------------+
| value_error    | 288 ns | 258 ns: 1.12x faster |
+----------------+--------+----------------------+
| Geometric mean | (ref)  | 1.11x faster         |
+----------------+--------+----------------------+

Benchmark:

import pyperf

EMPTY_DICT = {}
INNER_LOOPS = 1024
KEYS = tuple(range(INNER_LOOPS))

def bench_key_error(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()

    value = "value"
    for _ in range_it:
        d = EMPTY_DICT
        for key in KEYS:
            try:
                d[key]
            except KeyError:
                pass

    return pyperf.perf_counter() - t0


def bench_value_error(loops):
    range_it = range(loops)
    t0 = pyperf.perf_counter()

    value = "value"
    for _ in range_it:
        try: raise ValueError(value)
        except: pass
        try: raise ValueError(value)
        except: pass

        try: raise ValueError(value)
        except: pass
        try: raise ValueError(value)
        except: pass

        try: raise ValueError(value)
        except: pass
        try: raise ValueError(value)
        except: pass

        try: raise ValueError(value)
        except: pass
        try: raise ValueError(value)
        except: pass

        try: raise ValueError(value)
        except: pass
        try: raise ValueError(value)
        except: pass

    return pyperf.perf_counter() - t0

runner = pyperf.Runner()
runner.bench_time_func('key_error', bench_key_error, inner_loops=INNER_LOOPS)
runner.bench_time_func('value_error', bench_value_error, inner_loops=10)

vstinner · 2024-05-10T10:03:17Z

@serhiy-storchaka:

The half of the gain is due to the change in _PyErr_SetKeyError() which is not equivalent to the current code, so it perhaps should be reverted.

I fixed the code.

Addition of BaseException_vectorcall is still beneficial right now,

10% to 12% faster to create exceptions sound appealing since raising KeyError is a common operation.

I ran benchmark on functions using METH_VARVARGS. KeyError is the clear winner in number of calls when running ./python -m test test_builtin.

but optimizing _PyObject_MakeTpCall and type_call may reduce the gain even more. And it does not beneficial for exceptions with new or init. In the end, it may make the benefit/cost ratio pretty low.

I don't see any simple optimization opportunity, do you?

I do not actively oppose this change, I only suggest that there may be place for more general optimization.

The optimization applies to 32 built-in exceptions:

ArithmeticError
AssertionError
BaseException
BufferError
BytesWarning
DeprecationWarning
EOFError
EncodingWarning
Exception
FloatingPointError
FutureWarning
GeneratorExit
ImportWarning
IndexError
KeyError
KeyboardInterrupt
LookupError
NotImplementedError
OverflowError
PendingDeprecationWarning
PythonFinalizationError
RecursionError
ReferenceError
ResourceWarning
RuntimeError
RuntimeWarning
StopAsyncIteration
SyntaxWarning
SystemError
TypeError
UnicodeError
UnicodeWarning
UserWarning
ValueError
Warning
ZeroDivisionError

vstinner · 2024-05-10T19:09:17Z

Merged, thanks for reviews.

* BaseException_vectorcall() now creates a tuple from 'args' array. * Creation an exception using BaseException_vectorcall() is now a single function call, rather than having to call BaseException_new() and then BaseException_init(). Calling BaseException_init() is inefficient since it overrides the 'args' attribute. * _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the KeyError instance to use BaseException_vectorcall().

bedevere-app bot mentioned this pull request May 7, 2024

Implement vectorcall for BaseException to optimize creation of exception instances #118702

Closed

vstinner added the skip news label May 7, 2024

eendebakpt mentioned this pull request May 7, 2024

Add Py_TuplePack2 and Py_TuplePack1 #118222

Closed

erlend-aasland reviewed May 7, 2024

View reviewed changes

Objects/exceptions.c Outdated Show resolved Hide resolved

serhiy-storchaka reviewed May 7, 2024

View reviewed changes

serhiy-storchaka reviewed May 8, 2024

View reviewed changes

vstinner force-pushed the baseexception_vectorcall branch from fcc7a2f to e689491 Compare May 10, 2024 09:16

vstinner marked this pull request as ready for review May 10, 2024 09:39

vstinner requested a review from iritkatriel as a code owner May 10, 2024 09:39

bedevere-app bot added the awaiting core review label May 10, 2024

vstinner added 4 commits May 10, 2024 11:45

Fix _PyErr_SetKeyError()

bf157d3

Add unit test

3871012

Fix refleak

eb0b861

vstinner force-pushed the baseexception_vectorcall branch from 4424f98 to eb0b861 Compare May 10, 2024 09:45

erlend-aasland approved these changes May 10, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels May 10, 2024

serhiy-storchaka approved these changes May 10, 2024

View reviewed changes

vstinner merged commit aa36f83 into python:main May 10, 2024

vstinner deleted the baseexception_vectorcall branch May 10, 2024 19:08

bedevere-app bot removed the awaiting merge label May 10, 2024

szef1233 approved these changes May 11, 2024

View reviewed changes

Uh oh!

gh-118702: Implement vectorcall for BaseException #118703

gh-118702: Implement vectorcall for BaseException #118703

Uh oh!

Conversation

vstinner commented May 7, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented May 7, 2024

Uh oh!

vstinner commented May 7, 2024

Uh oh!

Uh oh!

erlend-aasland commented May 7, 2024

Uh oh!

vstinner commented May 7, 2024

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

eendebakpt commented May 7, 2024

Uh oh!

vstinner commented May 7, 2024

Uh oh!

vstinner commented May 7, 2024

Uh oh!

serhiy-storchaka commented May 7, 2024

Uh oh!

erlend-aasland commented May 7, 2024

Uh oh!

serhiy-storchaka May 8, 2024

Choose a reason for hiding this comment

Uh oh!

vstinner May 10, 2024

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented May 8, 2024

Uh oh!

vstinner commented May 10, 2024

Uh oh!

vstinner commented May 10, 2024

Uh oh!

vstinner commented May 10, 2024

Uh oh!

Uh oh!

vstinner commented May 7, 2024 •

edited by bedevere-app bot

Loading