-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
[WIP] gh-129813, PEP 782: Add PyBytesWriter C API #131681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
459f3d8
to
9097e5f
Compare
Add functions: * PyBytesWriter_Create() * PyBytesWriter_Discard() * PyBytesWriter_Finish() * PyBytesWriter_FinishWithSize() * PyBytesWriter_FinishWithEndPointer() * PyBytesWriter_Data() * PyBytesWriter_Allocated() * PyBytesWriter_SetSize() * PyBytesWriter_Resize()
9097e5f
to
e24d40e
Compare
Convert _PyBytes_FromHex().
Replace PyBytes_FromStringAndSize(NULL, 0) with Py_GetConstant(Py_CONSTANT_EMPTY_BYTES).
This change has no impact on performance, even if the new public API allocates memory on the heap, instead of allocating on the stack. It uses a freelist to optimize Microbenchmark on 3 functions, to compare the private
import pyperf
import binascii
runner = pyperf.Runner()
runner.bench_func('from list 100', bytes, list(b'x' * 100))
runner.bench_func('from list 1,000', bytes, list(b'x' * 1_000))
runner.bench_func('from hex 100', bytes.fromhex, bytes(range(100)).hex())
runner.bench_func('from hex 1,000', bytes.fromhex, (b'x' * 1_000).hex())
runner.bench_func('b2a_uu', binascii.b2a_uu, b'x' * 45) Result:
Benchmark hidden because not significant (1): from list 1,000 |
Benchmark comparing Benchmark: import pyperf
SIZES = (10, 100, 500)
runner = pyperf.Runner()
for size in SIZES:
large_int = (2 ** (size * 8) - 1)
runner.bench_func(f'to_bytes({size})', large_int.to_bytes, size)
for size in SIZES:
mem = memoryview(b'x' * size)
runner.bench_func(f'memoryview({size}).tobytes()', mem.tobytes) Result:
It's hard to beat There is an overhead around 10 ns when using |
Could you please benchmark the following?
|
I wrote a big PR to show how PEP 782 would look like and how it's being used. But if PEP 782 is accepted, I will only start by adding the API without using it. Then I will write separated changes to use the new API and run benchmarks on each change.
I didn't modify these encoders, they still use the private
Same. If I modify these encoders and error handlers later, I will run benchmarks to decide if it's acceptable to use the public API or not. |
Microbenchmark on import pyperf
runner = pyperf.Runner()
import ctypes
from ctypes import pythonapi, py_object
from ctypes import (
c_int, c_uint,
c_long, c_ulong,
c_size_t, c_ssize_t,
c_char_p)
PyBytes_FromFormat = pythonapi.PyBytes_FromFormat
PyBytes_FromFormat.argtypes = (c_char_p,)
PyBytes_FromFormat.restype = py_object
PyBytes_DecodeEscape = pythonapi.PyBytes_DecodeEscape
PyBytes_DecodeEscape.argtypes = (c_char_p, c_size_t, c_char_p, c_size_t, c_char_p)
PyBytes_DecodeEscape.restype = py_object
runner.bench_func('Format hello world', PyBytes_FromFormat, b'Hello %s !', b'world')
fmt = (b'Hell%c' + b' ' * 1024 + b' %s')
runner.bench_func('Format long format', PyBytes_FromFormat, fmt, c_int(ord('o')), b'world')
s = b'abc\\ndef\\x40.'
runner.bench_func('Decode simple', PyBytes_DecodeEscape, s, len(s), None, 0, b'unused')
s = b'x' * 1024
runner.bench_func('Decode long copy', PyBytes_DecodeEscape, s, len(s), None, 0, b'unused')
s = b'\\x40' * 1024
runner.bench_func('Decode long \\x40', PyBytes_DecodeEscape, s, len(s), None, 0, b'unused') Results:
Benchmark hidden because not significant (1): Format hello world I'm not sure why PEP 782 is faster, but at least it's not slower :-) I build Python with |
Add functions: