Skip to content

Switch Buffers to memoryviews & remove extra copies/allocations #656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Mar 30, 2025

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Nov 22, 2024

When this was written in the code, Python's Buffer Protocol support was inconsistent across Python versions (specifically on Python 2.7). Since Python 2.7 reached EOL and it was dropped from Numcodecs, the Python Buffer Protocol support has become more consistent.

At this stage the memoryview object, which Cython also supports, does all the same things that Buffer would do for us. Plus it is builtin to the Python standard library. It behaves similarly in a lot of ways.

Given this, switch the code over to memoryviews internally and drop Buffer.


Additionally have pushed changes to this PR to improve overall memory usage. This eliminates some unneeded copies that occurred at the ended of some codecs. Also have eliminated some temporary allocations used in some codec pipelines by allocating output buffers earlier and changing operations to act in-place. This should eliminate some spiky behavior seen recently with codecs.


TODO:

  • Unit tests and/or doctests in docstrings
  • Tests pass locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • Docs build locally
  • GitHub Actions CI passes
  • Test coverage to 100% (Codecov passes)

@jakirkham jakirkham force-pushed the use_mv branch 2 times, most recently from 923df20 to e7afaa2 Compare November 22, 2024 03:56
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (3c933cf) to head (c83136a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #656   +/-   ##
=======================================
  Coverage   99.96%   99.96%           
=======================================
  Files          63       63           
  Lines        2771     2771           
=======================================
  Hits         2770     2770           
  Misses          1        1           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jakirkham jakirkham force-pushed the use_mv branch 2 times, most recently from 7255767 to 2a942ad Compare November 22, 2024 04:07
When this was written in the code, Python's Buffer Protocol support was
inconsistent across Python versions (specifically on Python 2.7). Since
Python 2.7 reached EOL and it was dropped from Numcodecs, the Python
Buffer Protocol support has become more consistent.

At this stage the `memoryview` object, which Cython also supports, does
all the same things that `Buffer` would do for us. Plus it is builtin to
the Python standard library. It behaves similarly in a lot of ways.

Given this, switch the code over to `memoryview`s internally and drop
`Buffer`.
@jakirkham
Copy link
Member Author

Planning to merge end of week if no comments

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great improvement. Thanks so much @jakirkham!

Unfortunately my Cython isn't really good enough to provide a useful review. But the tests pass! 😆 And we are reducing LoC and simplifying many functions. So seems like a great direction.

Someone who actually groks Cython should review this for real.

@d-v-b
Copy link
Contributor

d-v-b commented Feb 12, 2025

@jakirkham any interest in pushing this forward?

@jakirkham
Copy link
Member Author

Have resolved the conflicts

Also to avoid making the diff as messy have added in try...finally... stubs. This makes the diff easier to read

We can clean these try...finally...s out in a subsequent PR and exclude the changes from git blame to avoid confusion

Lastly added a new entry

Please let me know if anything else is needed

During encoding preallocate a `bytes` object for the final result and
write everything directly into it. This avoids unnecessary staging and
copying of intermediate results.

Make use of Cython typed-`memoryview`s throughout encode and decode for
efficient access of the underlying data.

Further leverage the `store_le32` and `load_le32` functions to quickly
pack and unpack little-endian 32-bit unsigned integers from buffers
when encoding and decoding.
Comment on lines 153 to 155
# resize after compression
dest = dest[:compressed_size]
dest_objptr = <PyObject*>dest
_PyBytes_Resize(&dest_objptr, compressed_size)
Copy link
Member Author

@jakirkham jakirkham Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per issue ( #717 ), think this should fix the issue in Zstd (have pushed similar changes for the other codecs in this PR)

In particular this code had created a copy of the bytes object (trimmed to compressed_size)

dest = dest[:compressed_size]

To fix this we now call _PyBytes_Resize, which will resize the existing bytes object. This uses PyObject_Realloc under-the-hood. A good realloc implementation can detect when the memory requested is less than the size of the original allocation (as is the case here). When that happens it will shrink the allocation in-place (meaning no new memory allocation and no-copy), this is a fast operation

@jakirkham jakirkham changed the title Switch Buffers to memoryviews Switch Buffers to memoryviews & remove extra copies/allocations Mar 26, 2025
The `_PyBytes_Resize` function is helpful for resizing a `bytes` object
after it is allocated. When the underlying `bytes` object only has one
reference to it, the function can potentially use realloc to shrink or
grow the allocation in-place.

While the function signature of `_PyBytes_Resize` makes sense, it is a
little unwieldly when used directly in Cython. To smooth this out a
bit, use a macro to wrap calls to `_PyBytes_Resize`. This allows us to
work with `PyObject*`s, which Cython handles well, instead of
`PyObject**`s, which Cython handles awkwardly.

The end result is a function from Cython's perspective, which is easy to
use, and one under-the-hood that simply massages our input arguments
into something `_PyBytes_Resize` expects.
Include a function to ensure an object is converted into a contiguous
`memoryview` object.
Provide this macro in one place and `cimport` it everywhere else.
* Global `cimport`'s first
* Start with core modules `cython`, `libc`, etc.
* Add extensions `cpython` & `numpy`
* Internal extensions
* Then `import`s similarly grouped
Copy link
Member Author

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few notes on recent changes to reduce memory usage by doing in-place size reductions of objects (instead of copies as was the case before)

Should help with spiky memory consumption that @tomwhite spotted recently: #717

Comment on lines +4 to +8
cdef extern from *:
"""
#define PyBytes_RESIZE(b, n) _PyBytes_Resize(&b, n)
"""
int PyBytes_RESIZE(object b, Py_ssize_t n) except -1
Copy link
Member Author

@jakirkham jakirkham Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added this function (macro) to help with resizing bytes objects in Cython

It is a thin wrapper around _PyBytes_Resize, which makes it easier to work with in Cython

This allows us to in-place truncate bytes allocations that are larger than we end up needing without needing to copy to a new bytes object


# check compression was successful
if compressed_size <= 0:
raise RuntimeError('LZ4 compression error: %s' % compressed_size)

# resize after compression
compressed_size += sizeof(uint32_t)
dest = dest[:compressed_size]
PyBytes_RESIZE(dest, compressed_size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus when we need to truncate a bytes object, we can avoid [:<n>], which copies the bytes object

Instead we just call PyBytes_RESIZE, which can do an in-place size reduction of the bytes object

Any excess memory goes back to the memory pool, which Python can use as it sees fit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just tested this using the Mac 14 wheels from https://github.com/zarr-developers/numcodecs/actions/runs/14105743382 using the memray code in https://github.com/tomwhite/memray-array - and can confirm that it works. I can see the memory saving (peak memory goes form 300MB to 200MB). This was using zstd with Zarr v3.

@d-v-b
Copy link
Contributor

d-v-b commented Mar 27, 2025

@jakirkham thanks so much for this, it looks like a great improvement. I am not qualified to review this because I don't know cython, is there anyone you can recommend?

@d-v-b
Copy link
Contributor

d-v-b commented Mar 27, 2025

i'm also happy merging after self-review

@jakirkham
Copy link
Member Author

Thanks Davis! 🙏

@dstansby would you have time to look within the next week? 🙂

@dstansby
Copy link
Contributor

Sorry, I don't think I have enough C/Cython experience to review this.

@jakirkham
Copy link
Member Author

jakirkham commented Mar 30, 2025

No worries. In that case will go ahead and merge

Ryan had reviewed it earlier. Also the fact that Davis took a look and Tom tested it successfully provides some additional confidence. Cython knowledge isn't as common around here. So that is likely as good as we can do

Am AFK until the following week. Though happy to follow up on anything at that point. Please do feel free to ping me (otherwise it will likely get buried in the avalanche of notifications in the interim 😅)

@jakirkham jakirkham merged commit 27aeda2 into zarr-developers:main Mar 30, 2025
30 checks passed
@jakirkham jakirkham deleted the use_mv branch March 30, 2025 01:21
Copy link
Member Author

@jakirkham jakirkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that two of the pointer definitions below should have been marked const to match the associated typed-memoryviews, from which they are taken. However they were not

As a result, the compiler generates warnings about these ( #725 ). That said, we don't actually change the content the pointer is related to. So this is only a compile time warning, there is no actual issue at runtime

To fix the compiler warnings, added these consts (as shown below) in PR ( #728 ), which fixes this issue

Comment on lines +71 to +72
cdef const uint8_t[::1] b_mv = buf
cdef uint8_t* b_ptr = &b_mv[0]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdef const uint8_t[::1] b_mv = buf
cdef uint8_t* b_ptr = &b_mv[0]
cdef const uint8_t[::1] b_mv = buf
cdef const uint8_t* b_ptr = &b_mv[0]

Comment on lines +87 to +88
cdef const uint8_t[::1] b_mv = b
cdef uint8_t* b_ptr = &b_mv[0]
Copy link
Member Author

@jakirkham jakirkham Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdef const uint8_t[::1] b_mv = b
cdef uint8_t* b_ptr = &b_mv[0]
cdef const uint8_t[::1] b_mv = b
cdef const uint8_t* b_ptr = &b_mv[0]

Comment on lines +4 to +7
cdef extern from *:
"""
#define PyBytes_RESIZE(b, n) _PyBytes_Resize(&b, n)
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have an #ifndef guard to protect against redefinition error

So far it seems Cython doesn't encounter this error as it defines this once per module. Still it is a good idea to add the guard

Handling in PR: #732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants