Skip to content

[WIP] gh-129813, PEP 782: Add PyBytesWriter C API #131681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Mar 24, 2025

Add functions:

  • PyBytesWriter_Create()
  • PyBytesWriter_Discard()
  • PyBytesWriter_Finish()
  • PyBytesWriter_FinishWithSize()
  • PyBytesWriter_FinishWithEndPointer()
  • PyBytesWriter_Data()
  • PyBytesWriter_Allocated()
  • PyBytesWriter_SetSize()
  • PyBytesWriter_Resize()

Add functions:

* PyBytesWriter_Create()
* PyBytesWriter_Discard()
* PyBytesWriter_Finish()
* PyBytesWriter_FinishWithSize()
* PyBytesWriter_FinishWithEndPointer()
* PyBytesWriter_Data()
* PyBytesWriter_Allocated()
* PyBytesWriter_SetSize()
* PyBytesWriter_Resize()
@vstinner vstinner force-pushed the bytes_writer_size branch from 9097e5f to e24d40e Compare March 24, 2025 16:56
@vstinner vstinner changed the title [WIP] gh-129813: Add PyBytesWriter C API (with size flavor) [WIP] gh-129813: Add PyBytesWriter C API (flavor with size) Mar 24, 2025
@vstinner vstinner changed the title [WIP] gh-129813: Add PyBytesWriter C API (flavor with size) [WIP] gh-129813, PEP 782: Add PyBytesWriter C API Apr 2, 2025
@vstinner
Copy link
Member Author

vstinner commented Apr 22, 2025

This change has no impact on performance, even if the new public API allocates memory on the heap, instead of allocating on the stack. It uses a freelist to optimize PyBytesWriter_Create().

Microbenchmark on 3 functions, to compare the private _PyBytesWriter (ref) to the new public PyBytesWriter (change):

  • bytes(list)
  • bytes.fromhex(str)
  • binascii.b2a_uu(bytes)
import pyperf
import binascii

runner = pyperf.Runner()
runner.bench_func('from list 100', bytes, list(b'x' * 100))
runner.bench_func('from list 1,000', bytes, list(b'x' * 1_000))

runner.bench_func('from hex 100', bytes.fromhex, bytes(range(100)).hex())
runner.bench_func('from hex 1,000', bytes.fromhex, (b'x' * 1_000).hex())

runner.bench_func('b2a_uu', binascii.b2a_uu, b'x' * 45)

Result:

Benchmark ref change
from list 100 631 ns 623 ns: 1.01x faster
from hex 100 141 ns 145 ns: 1.03x slower
from hex 1,000 1.03 us 1.04 us: 1.00x slower
b2a_uu 112 ns 111 ns: 1.01x faster
Geometric mean (ref) 1.00x slower

Benchmark hidden because not significant (1): from list 1,000

@vstinner
Copy link
Member Author

Benchmark comparing PyBytes_FromStringAndSize(NULL, length) (ref) to PyBytesWriter_Create() (change).

Benchmark:

import pyperf

SIZES = (10, 100, 500)

runner = pyperf.Runner()
for size in SIZES:
    large_int = (2 ** (size * 8) - 1)
    runner.bench_func(f'to_bytes({size})', large_int.to_bytes, size)
for size in SIZES:
    mem = memoryview(b'x' * size)
    runner.bench_func(f'memoryview({size}).tobytes()', mem.tobytes)

Result:

Benchmark ref change
to_bytes(10) 56.3 ns 66.4 ns: 1.18x slower (+10.1 ns)
to_bytes(100) 152 ns 162 ns: 1.06x slower (+10 ns)
to_bytes(500) 563 ns 559 ns: 1.01x faster (+4 ns)
memoryview(10).tobytes() 37.5 ns 47.0 ns: 1.25x slower (+9.5 ns)
memoryview(100).tobytes() 35.3 ns 46.6 ns: 1.32x slower (+11.3 ns)
memoryview(500).tobytes() 45.5 ns 55.3 ns: 1.21x slower (+9.8 ns)
Geometric mean (ref) 1.16x slower

It's hard to beat PyBytes_FromStringAndSize(NULL, length) performance, since PyBytesWriter_Create() is a wrapper built on top of PyBytes_FromStringAndSize(NULL, length).

There is an overhead around 10 ns when using PyBytesWriter.

@serhiy-storchaka
Copy link
Member

Could you please benchmark the following?

  • ASCII, Latin1 and UTF-8 encoders. For ASCII-only and non-ASCII data.
  • The backslashreplace and xmlcharrefreplace error handlers (encoding).
  • PyBytes_FromFormat(). Especially with few % formats and large raw data between them.
  • PyBytes_DecodeEscape().

@vstinner
Copy link
Member Author

vstinner commented May 6, 2025

I wrote a big PR to show how PEP 782 would look like and how it's being used. But if PEP 782 is accepted, I will only start by adding the API without using it. Then I will write separated changes to use the new API and run benchmarks on each change.

ASCII, Latin1 and UTF-8 encoders. For ASCII-only and non-ASCII data.

I didn't modify these encoders, they still use the private _PyBytesWriter API.

The backslashreplace and xmlcharrefreplace error handlers (encoding).

Same.

If I modify these encoders and error handlers later, I will run benchmarks to decide if it's acceptable to use the public API or not.

@vstinner
Copy link
Member Author

vstinner commented May 6, 2025

Microbenchmark on PyBytes_FromFormat() and PyBytes_DecodeEscape() functions.

import pyperf
runner = pyperf.Runner()

import ctypes
from ctypes import pythonapi, py_object
from ctypes import (
    c_int, c_uint,
    c_long, c_ulong,
    c_size_t, c_ssize_t,
    c_char_p)

PyBytes_FromFormat = pythonapi.PyBytes_FromFormat
PyBytes_FromFormat.argtypes = (c_char_p,)
PyBytes_FromFormat.restype = py_object

PyBytes_DecodeEscape = pythonapi.PyBytes_DecodeEscape
PyBytes_DecodeEscape.argtypes = (c_char_p, c_size_t, c_char_p, c_size_t, c_char_p)
PyBytes_DecodeEscape.restype = py_object

runner.bench_func('Format hello world', PyBytes_FromFormat, b'Hello %s !', b'world')
fmt = (b'Hell%c' + b' ' * 1024 + b' %s')
runner.bench_func('Format long format', PyBytes_FromFormat, fmt, c_int(ord('o')), b'world')

s = b'abc\\ndef\\x40.'
runner.bench_func('Decode simple', PyBytes_DecodeEscape, s, len(s), None, 0, b'unused')
s = b'x' * 1024
runner.bench_func('Decode long copy', PyBytes_DecodeEscape, s, len(s), None, 0, b'unused')
s = b'\\x40' * 1024
runner.bench_func('Decode long \\x40', PyBytes_DecodeEscape, s, len(s), None, 0, b'unused')

Results:

Benchmark ref pep782
Format long format 1.06 us 1.04 us: 1.02x faster
Decode simple 776 ns 743 ns: 1.04x faster
Decode long copy 1.38 us 1.34 us: 1.03x faster
Decode long \x40 2.70 us 2.67 us: 1.01x faster
Geometric mean (ref) 1.02x faster

Benchmark hidden because not significant (1): Format hello world

I'm not sure why PEP 782 is faster, but at least it's not slower :-)

I build Python with gcc -O3 (without PGO, LTO, CPU isolation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants