Skip to content

Improve performance when converting python bytes/bytearray to Vec<u8> #2888

Open
@vxgmichel

Description

@vxgmichel

Hi all, I noticed a performance issue when extracting a PyBytes or a PyByteArray object into a Vec<u8>.

This is an issue one can easily run into without realizing it. Here's a scenario, let's say we'd like to expose a simple checksum function:

#[pyfunction]
fn checksum(data: &[u8]) -> PyResult<u8> {
   let mut result = 0;
   for x in data {
       result ^= x;
   }
   Ok(result)
}

See how it performs against the equivalent python implementation, processing 1MB a hundred times:

2.65s call     test_checksum.py::test_perf[py-bytes]
0.00s call     test_checksum.py::test_perf[rs-bytes]

Looks really fast! However, it won't accept a bytearray as an argument:

TypeError: argument 'data': 'bytearray' object cannot be converted to 'PyBytes'

So we update our implementation to take a Vec<u8> instead:

#[pyfunction]
fn checksum(data: Vec<u8>) -> PyResult<u8> {
    let mut result = 0;
    for x in data {
        result ^= x;
    }
    Ok(result)
}

And now the results:

2.61s call     test_checksum.py::test_perf[py-bytearray]
2.55s call     test_checksum.py::test_perf[py-bytes]
1.92s call     test_checksum.py::test_perf[rs-bytearray]
1.87s call     test_checksum.py::test_perf[rs-bytes]

It performs roughly the same as python, which makes sense if we look at the FromPyObject implementation for Vec<T>:

pyo3/src/types/sequence.rs

Lines 314 to 318 in bed4f9d

let mut v = Vec::with_capacity(seq.len().unwrap_or(0));
for item in seq.iter()? {
v.push(item?.extract::<T>()?);
}
Ok(v)

The bytes/bytearray object is iterated and each item (i.e a python integer) is separately extracted into a u8.

This could be fixed by specializing the extract logic in the case of a Vec<u8> and use specific methods such as PyBytes::as_bytes().to_vec() and PyByteArray::to_vec(). Here's a possible patch:
https://gist.github.com/vxgmichel/367e01e8504cb9c9e700a22525e8b68d

With this patch applied, the performance is now similar to what we had with the &[u8] slice:

2.70s call     test_checksum.py::test_perf[py-bytearray]
2.65s call     test_checksum.py::test_perf[py-bytes]
0.00s call     test_checksum.py::test_perf[rs-bytes]
0.00s call     test_checksum.py::test_perf[rs-bytearray]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions