Description
Over on python/cpython#111140 and PR python/cpython#114886 I've proposed an implementation for converting between PyLongObject
and arbitrary-sized native integers. This is exposed (inefficiently) as the private _PyLong_AsByteArray
with a more complex signature, but I believe this design is more appropriate for long-term stability, retains the ability for us to optimise, leads users into correct usage, supports existing needs (in particular, PyO3-style language interop), and is readable in source code.
Full docs on the PR, but here's the prototype and summary:
int PyLong_CopyBits(PyObject *pylong, void* buffer, size_t n_bytes, int endianness)
PyObject* PyLong_FromBits(const void* buffer, size_t n_bytes, int endianness)
PyObject* PyLong_FromUnsignedBits(const void* buffer, size_t n_bytes, int endianness)
Endianness is -1 for native, or 0/1 for big/little. The expectation is that most callers will just pass -1, particularly when calling from another language, but if someone passes PY_LITTLE_ENDIAN
from config.h then it will do the correct thing.
PyLong_CopyBits
returns -1 with an exception set, or the required size of the buffer - if the result is less/equal to n_bytes
, no overflow occurred. No exception is raised for overflow, and buffer
is always filled up to n_bytes
even if higher bytes get dropped (like a C-style downcast).
The FromBits
functions are basically just consistent wrappers around the existing private function. They'll do an __index__
call if necessary, but otherwise there was no gain in changing the implementation there.
About the only thing I don't love is using "bits" when the argument is the number of bytes. Taking bytes is nice for what I consider the "normal" case of PyLong_CopyBits(v, &int32_value, sizeof(int32_value), -1)
, and CopyBytes
didn't seem to imply as low-level an operation as copying bits. There's some discussion of As/FromByteArray
on the issue.
The return value of CopyBits
is also unusual, but I believe justified. The callers are very unlikely to simply return an OverflowError
back into Python, and will more likely clear it and do something different (perhaps allocating dynamic memory to copy the bits to), or just don't care and are happy to truncate (because they've already range checked).
I believe these are good candidates for the limited API, but I'm happy to ship them first and gather feedback before pushing for that. The current limited API alternative is to use PyObject_CallMethod(v, "to_bytes" ...)
and copy out of the bytes
object.
Any thoughts/suggestions/concerns about these APIs?