-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-118702: Implement vectorcall for BaseException #118703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Benchmark: import pyperf
EMPTY_DICT = {}
INNER_LOOPS = 1024
KEYS = tuple(range(INNER_LOOPS))
def bench_keyerror():
d = EMPTY_DICT
for key in KEYS:
try:
d[key]
except KeyError:
pass
runner = pyperf.Runner()
runner.bench_func('keyerror', bench_keyerror, inner_loops=INNER_LOOPS) |
Nice change, but you broke CI :) |
Right, It should now be fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that switching to vector call makes a difference here. How is it in comparison with the following BaseException_vectorcall
implementation?
argstuple = _PyTuple_FromArray(args, PyVectorcall_NARGS(nargsf));
self = type_obj->tp_new(type_obj, argstuple, NULL);
Py_TYPE(self)->tp_init(self, argstuple, NULL);
Py_DECREF(argstuple);
If it is still faster than the current code, then there is something wrong in the non-vectorcall path.
In current main there is a call to |
This change avoids calling type_call() which contains more code. |
Benchmark result with CPU isolation (this PR):
I also measured only the Python/errors.c changes; replace
It's faster, but not as fast as this PR. |
But was is wrong with |
AFAICS, in |
Python/errors.c
Outdated
PyObject *exc = PyObject_CallOneArg(PyExc_KeyError, arg); | ||
if (!exc) { | ||
/* caller will expect error to be set anyway */ | ||
return; | ||
} | ||
_PyErr_SetObject(tstate, PyExc_KeyError, tup); | ||
Py_DECREF(tup); | ||
|
||
_PyErr_SetRaisedException(tstate, exc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the same. It does not set __context__
, and there may be other differences in case when it fails to create an exception object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. I fixed this regression and added an unit test.
The half of the gain is due to the change in I do not actively oppose this change, I only suggest that there may be place for more general optimization. If you still want to add |
fcc7a2f
to
e689491
Compare
* BaseException_vectorcall() now creates a tuple from 'args' array. * Creation an exception using BaseException_vectorcall() is now a single function call, rather than having to call BaseException_new() and then BaseException_init(). Calling BaseException_init() is inefficient since it overrides the 'args' attribute. * _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the KeyError instance to use BaseException_vectorcall(). Micro-benchmark on creating a KeyError on accessing a non-existent dictionary key: Mean +- std dev: 447 ns +- 31 ns -> 373 ns +- 15 ns: 1.20x faster
4424f98
to
eb0b861
Compare
The PR is now ready for a review. The main branch is now Python 3.14. I rebased the PR on the main branch and squashed commit. I fixed the Updated benchmark, Python built with
I added a benchmark on
Benchmark: import pyperf
EMPTY_DICT = {}
INNER_LOOPS = 1024
KEYS = tuple(range(INNER_LOOPS))
def bench_key_error(loops):
range_it = range(loops)
t0 = pyperf.perf_counter()
value = "value"
for _ in range_it:
d = EMPTY_DICT
for key in KEYS:
try:
d[key]
except KeyError:
pass
return pyperf.perf_counter() - t0
def bench_value_error(loops):
range_it = range(loops)
t0 = pyperf.perf_counter()
value = "value"
for _ in range_it:
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
try: raise ValueError(value)
except: pass
return pyperf.perf_counter() - t0
runner = pyperf.Runner()
runner.bench_time_func('key_error', bench_key_error, inner_loops=INNER_LOOPS)
runner.bench_time_func('value_error', bench_value_error, inner_loops=10) |
I fixed the code.
10% to 12% faster to create exceptions sound appealing since raising KeyError is a common operation. I ran benchmark on functions using METH_VARVARGS. KeyError is the clear winner in number of calls when running
I don't see any simple optimization opportunity, do you?
The optimization applies to 32 built-in exceptions:
|
Merged, thanks for reviews. |
* BaseException_vectorcall() now creates a tuple from 'args' array. * Creation an exception using BaseException_vectorcall() is now a single function call, rather than having to call BaseException_new() and then BaseException_init(). Calling BaseException_init() is inefficient since it overrides the 'args' attribute. * _PyErr_SetKeyError() now uses PyObject_CallOneArg() to create the KeyError instance to use BaseException_vectorcall().
Micro-benchmark on creating a KeyError on accessing a non-existent dictionary key: