Description
Hi all, I noticed a performance issue when extracting a PyBytes
or a PyByteArray
object into a Vec<u8>
.
This is an issue one can easily run into without realizing it. Here's a scenario, let's say we'd like to expose a simple checksum function:
#[pyfunction]
fn checksum(data: &[u8]) -> PyResult<u8> {
let mut result = 0;
for x in data {
result ^= x;
}
Ok(result)
}
See how it performs against the equivalent python implementation, processing 1MB a hundred times:
2.65s call test_checksum.py::test_perf[py-bytes]
0.00s call test_checksum.py::test_perf[rs-bytes]
Looks really fast! However, it won't accept a bytearray as an argument:
TypeError: argument 'data': 'bytearray' object cannot be converted to 'PyBytes'
So we update our implementation to take a Vec<u8>
instead:
#[pyfunction]
fn checksum(data: Vec<u8>) -> PyResult<u8> {
let mut result = 0;
for x in data {
result ^= x;
}
Ok(result)
}
And now the results:
2.61s call test_checksum.py::test_perf[py-bytearray]
2.55s call test_checksum.py::test_perf[py-bytes]
1.92s call test_checksum.py::test_perf[rs-bytearray]
1.87s call test_checksum.py::test_perf[rs-bytes]
It performs roughly the same as python, which makes sense if we look at the FromPyObject
implementation for Vec<T>
:
Lines 314 to 318 in bed4f9d
The bytes
/bytearray
object is iterated and each item (i.e a python integer) is separately extracted into a u8
.
This could be fixed by specializing the extract logic in the case of a Vec<u8>
and use specific methods such as PyBytes::as_bytes().to_vec()
and PyByteArray::to_vec()
. Here's a possible patch:
https://gist.github.com/vxgmichel/367e01e8504cb9c9e700a22525e8b68d
With this patch applied, the performance is now similar to what we had with the &[u8]
slice:
2.70s call test_checksum.py::test_perf[py-bytearray]
2.65s call test_checksum.py::test_perf[py-bytes]
0.00s call test_checksum.py::test_perf[rs-bytes]
0.00s call test_checksum.py::test_perf[rs-bytearray]