Switch `Buffer`s to `memoryview`s & remove extra copies/allocations #656

jakirkham · 2024-11-22T03:38:02Z

When this was written in the code, Python's Buffer Protocol support was inconsistent across Python versions (specifically on Python 2.7). Since Python 2.7 reached EOL and it was dropped from Numcodecs, the Python Buffer Protocol support has become more consistent.

At this stage the memoryview object, which Cython also supports, does all the same things that Buffer would do for us. Plus it is builtin to the Python standard library. It behaves similarly in a lot of ways.

Given this, switch the code over to memoryviews internally and drop Buffer.

Additionally have pushed changes to this PR to improve overall memory usage. This eliminates some unneeded copies that occurred at the ended of some codecs. Also have eliminated some temporary allocations used in some codec pipelines by allocating output buffers earlier and changing operations to act in-place. This should eliminate some spiky behavior seen recently with codecs.

TODO:

Unit tests and/or doctests in docstrings
Tests pass locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
Docs build locally
GitHub Actions CI passes
Test coverage to 100% (Codecov passes)

codecov · 2024-11-22T03:59:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (3c933cf) to head (c83136a).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #656   +/-   ##
=======================================
  Coverage   99.96%   99.96%           
=======================================
  Files          63       63           
  Lines        2771     2771           
=======================================
  Hits         2770     2770           
  Misses          1        1

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

When this was written in the code, Python's Buffer Protocol support was inconsistent across Python versions (specifically on Python 2.7). Since Python 2.7 reached EOL and it was dropped from Numcodecs, the Python Buffer Protocol support has become more consistent. At this stage the `memoryview` object, which Cython also supports, does all the same things that `Buffer` would do for us. Plus it is builtin to the Python standard library. It behaves similarly in a lot of ways. Given this, switch the code over to `memoryview`s internally and drop `Buffer`.

jakirkham · 2024-12-03T10:21:22Z

Planning to merge end of week if no comments

rabernat

This seems like a great improvement. Thanks so much @jakirkham!

Unfortunately my Cython isn't really good enough to provide a useful review. But the tests pass! 😆 And we are reducing LoC and simplifying many functions. So seems like a great direction.

Someone who actually groks Cython should review this for real.

d-v-b · 2025-02-12T09:52:51Z

@jakirkham any interest in pushing this forward?

jakirkham · 2025-03-26T08:17:19Z

Have resolved the conflicts

Also to avoid making the diff as messy have added in try...finally... stubs. This makes the diff easier to read

We can clean these try...finally...s out in a subsequent PR and exclude the changes from git blame to avoid confusion

Lastly added a new entry

Please let me know if anything else is needed

numcodecs/blosc.pyx

During encoding preallocate a `bytes` object for the final result and write everything directly into it. This avoids unnecessary staging and copying of intermediate results. Make use of Cython typed-`memoryview`s throughout encode and decode for efficient access of the underlying data. Further leverage the `store_le32` and `load_le32` functions to quickly pack and unpack little-endian 32-bit unsigned integers from buffers when encoding and decoding.

jakirkham · 2025-03-26T09:50:09Z

numcodecs/zstd.pyx

    # resize after compression
-    dest = dest[:compressed_size]
+    dest_objptr = <PyObject*>dest
+    _PyBytes_Resize(&dest_objptr, compressed_size)


Per issue ( #717 ), think this should fix the issue in Zstd (have pushed similar changes for the other codecs in this PR)

In particular this code had created a copy of the bytes object (trimmed to compressed_size)

dest = dest[:compressed_size]

To fix this we now call _PyBytes_Resize, which will resize the existing bytes object. This uses PyObject_Realloc under-the-hood. A good realloc implementation can detect when the memory requested is less than the size of the original allocation (as is the case here). When that happens it will shrink the allocation in-place (meaning no new memory allocation and no-copy), this is a fast operation

The `_PyBytes_Resize` function is helpful for resizing a `bytes` object after it is allocated. When the underlying `bytes` object only has one reference to it, the function can potentially use realloc to shrink or grow the allocation in-place. While the function signature of `_PyBytes_Resize` makes sense, it is a little unwieldly when used directly in Cython. To smooth this out a bit, use a macro to wrap calls to `_PyBytes_Resize`. This allows us to work with `PyObject*`s, which Cython handles well, instead of `PyObject**`s, which Cython handles awkwardly. The end result is a function from Cython's perspective, which is easy to use, and one under-the-hood that simply massages our input arguments into something `_PyBytes_Resize` expects.

Include a function to ensure an object is converted into a contiguous `memoryview` object.

Provide this macro in one place and `cimport` it everywhere else.

* Global `cimport`'s first * Start with core modules `cython`, `libc`, etc. * Add extensions `cpython` & `numpy` * Internal extensions * Then `import`s similarly grouped

jakirkham

Added a few notes on recent changes to reduce memory usage by doing in-place size reductions of objects (instead of copies as was the case before)

Should help with spiky memory consumption that @tomwhite spotted recently: #717

jakirkham · 2025-03-27T11:54:46Z

numcodecs/compat_ext.pxd

+cdef extern from *:
+    """
+    #define PyBytes_RESIZE(b, n) _PyBytes_Resize(&b, n)
+    """
+    int PyBytes_RESIZE(object b, Py_ssize_t n) except -1


Have added this function (macro) to help with resizing bytes objects in Cython

It is a thin wrapper around _PyBytes_Resize, which makes it easier to work with in Cython

This allows us to in-place truncate bytes allocations that are larger than we end up needing without needing to copy to a new bytes object

jakirkham · 2025-03-27T11:56:40Z

numcodecs/lz4.pyx


    # check compression was successful
    if compressed_size <= 0:
        raise RuntimeError('LZ4 compression error: %s' % compressed_size)

    # resize after compression
    compressed_size += sizeof(uint32_t)
-    dest = dest[:compressed_size]
+    PyBytes_RESIZE(dest, compressed_size)


Thus when we need to truncate a bytes object, we can avoid [:<n>], which copies the bytes object

Instead we just call PyBytes_RESIZE, which can do an in-place size reduction of the bytes object

Any excess memory goes back to the memory pool, which Python can use as it sees fit

I have just tested this using the Mac 14 wheels from https://github.com/zarr-developers/numcodecs/actions/runs/14105743382 using the memray code in https://github.com/tomwhite/memray-array - and can confirm that it works. I can see the memory saving (peak memory goes form 300MB to 200MB). This was using zstd with Zarr v3.

d-v-b · 2025-03-27T14:06:07Z

@jakirkham thanks so much for this, it looks like a great improvement. I am not qualified to review this because I don't know cython, is there anyone you can recommend?

d-v-b · 2025-03-27T14:06:46Z

i'm also happy merging after self-review

jakirkham · 2025-03-28T00:15:51Z

Thanks Davis! 🙏

@dstansby would you have time to look within the next week? 🙂

dstansby · 2025-03-28T08:29:18Z

Sorry, I don't think I have enough C/Cython experience to review this.

jakirkham · 2025-03-30T01:20:43Z

No worries. In that case will go ahead and merge

Ryan had reviewed it earlier. Also the fact that Davis took a look and Tom tested it successfully provides some additional confidence. Cython knowledge isn't as common around here. So that is likely as good as we can do

Am AFK until the following week. Though happy to follow up on anything at that point. Please do feel free to ping me (otherwise it will likely get buried in the avalanche of notifications in the interim 😅)

jakirkham

Noticed that two of the pointer definitions below should have been marked const to match the associated typed-memoryviews, from which they are taken. However they were not

As a result, the compiler generates warnings about these ( #725 ). That said, we don't actually change the content the pointer is related to. So this is only a compile time warning, there is no actual issue at runtime

To fix the compiler warnings, added these consts (as shown below) in PR ( #728 ), which fixes this issue

jakirkham · 2025-04-07T21:42:29Z

numcodecs/fletcher32.pyx

+        cdef const uint8_t[::1] b_mv = buf
+        cdef uint8_t* b_ptr = &b_mv[0]


Suggested change

cdef const uint8_t[::1] b_mv = buf

cdef uint8_t* b_ptr = &b_mv[0]

cdef const uint8_t[::1] b_mv = buf

cdef const uint8_t* b_ptr = &b_mv[0]

jakirkham · 2025-04-07T21:42:43Z

numcodecs/fletcher32.pyx

+        cdef const uint8_t[::1] b_mv = b
+        cdef uint8_t* b_ptr = &b_mv[0]


Suggested change

cdef const uint8_t[::1] b_mv = b

cdef uint8_t* b_ptr = &b_mv[0]

cdef const uint8_t[::1] b_mv = b

cdef const uint8_t* b_ptr = &b_mv[0]

jakirkham · 2025-04-08T23:07:49Z

numcodecs/compat_ext.pxd

+cdef extern from *:
+    """
+    #define PyBytes_RESIZE(b, n) _PyBytes_Resize(&b, n)
+    """


This should have an #ifndef guard to protect against redefinition error

So far it seems Cython doesn't encounter this error as it defines this once per module. Still it is a good idea to add the guard

Handling in PR: #732

jakirkham force-pushed the use_mv branch 2 times, most recently from 923df20 to e7afaa2 Compare November 22, 2024 03:56

jakirkham force-pushed the use_mv branch 2 times, most recently from 7255767 to 2a942ad Compare November 22, 2024 04:07

jakirkham force-pushed the use_mv branch from 2a942ad to 0c823e2 Compare November 22, 2024 04:15

jakirkham mentioned this pull request Dec 5, 2024

An alternative approach to variable-length chunks: array concatenation within Zarr zarr-developers/zarr-python#2536

Open

rabernat approved these changes Dec 6, 2024

View reviewed changes

tomwhite mentioned this pull request Mar 25, 2025

Extra memory copies in blosc, lz4, and zstd compress functions #717

Closed

jakirkham added 5 commits March 25, 2025 23:36

Merge zarr-developers/main into jakirkham/use_mv

c690446

Add back ZSTD_freeCCtx

9ac8d8e

Drop leftover Buffer from merge conflict

0ade20c

Add minor comment

13f8e50

Add trivial try...finally...s to cleanup diff

bd1c401

jakirkham force-pushed the use_mv branch from bb45969 to bd1c401 Compare March 26, 2025 06:57

jakirkham added 4 commits March 26, 2025 00:17

Use Cython cimports for Python C API

64eed12

Add news entry

31a7446

Move cimports from libc up top

9069553

Resize buffers without copying

1ab9983

d-v-b reviewed Mar 26, 2025

View reviewed changes

numcodecs/blosc.pyx Outdated Show resolved Hide resolved

jakirkham added 4 commits March 26, 2025 02:19

Write directly to output array in VLen*

f91c283

Use _mv subscript name for typed-memoryivews

d0a7721

Add news entry for better memory usage

4341ce3

jakirkham commented Mar 26, 2025

View reviewed changes

jakirkham changed the title ~~Switch Buffers to memoryviews~~ Switch Buffers to memoryviews & remove extra copies/allocations Mar 26, 2025

jakirkham added 17 commits March 26, 2025 02:56

Fix fletecher32's cimports

4baf1e1

Fix blank lines to match

0ea019c

Reassign dest with dest_objptr

2b871c7

Fix source_ptr type

bdc7bc9

Fix declaration order

0bfafed

Add ensure_continguous_memoryview function

54fd4b2

Include a function to ensure an object is converted into a contiguous `memoryview` object.

Use ensure_contiguous_memoryview

eac4ef1

Move PyBytes_RESIZE macro to compat_ext

a77e8cf

Provide this macro in one place and `cimport` it everywhere else.

Minimize diff by readding blank line after try

fb5ffac

Group encv with value args

9b289c6

Organize vlen's imports

7d593b8

* Global `cimport`'s first * Start with core modules `cython`, `libc`, etc. * Add extensions `cpython` & `numpy` * Internal extensions * Then `import`s similarly grouped

Use ensure_contiguous_memoryview with VLen

7d68d53

Space out input arg handling & checksum check

b6d91ce

Use memcpy to speedup copies in fletcher32

2dd1cdf

In fletcher32's if output buffer, slice from input

b7bb7ef

Unwrap lines no longer needing wrapping

c83136a

jakirkham commented Mar 27, 2025

View reviewed changes

jakirkham merged commit 27aeda2 into zarr-developers:main Mar 30, 2025
30 checks passed

jakirkham deleted the use_mv branch March 30, 2025 01:21

jakirkham commented Apr 7, 2025

View reviewed changes

jakirkham commented Apr 8, 2025

View reviewed changes

		cdef const uint8_t[::1] b_mv = buf
		cdef uint8_t* b_ptr = &b_mv[0]

		cdef const uint8_t[::1] b_mv = b
		cdef uint8_t* b_ptr = &b_mv[0]

Switch Buffers to memoryviews & remove extra copies/allocations #656

Switch Buffers to memoryviews & remove extra copies/allocations #656

Uh oh!

Conversation

jakirkham commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Nov 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jakirkham commented Dec 3, 2024

Uh oh!

rabernat left a comment

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Feb 12, 2025

Uh oh!

jakirkham commented Mar 26, 2025

Uh oh!

Uh oh!

jakirkham Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham left a comment

Choose a reason for hiding this comment

Uh oh!

jakirkham Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

tomwhite Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Mar 27, 2025

Uh oh!

d-v-b commented Mar 27, 2025

Uh oh!

jakirkham commented Mar 28, 2025

Uh oh!

dstansby commented Mar 28, 2025

Uh oh!

jakirkham commented Mar 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jakirkham left a comment

Choose a reason for hiding this comment

Uh oh!

jakirkham Apr 7, 2025

Choose a reason for hiding this comment

Uh oh!

jakirkham Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Switch `Buffer`s to `memoryview`s & remove extra copies/allocations #656

Switch `Buffer`s to `memoryview`s & remove extra copies/allocations #656

jakirkham commented Nov 22, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading

jakirkham Mar 26, 2025 •

edited

Loading

jakirkham Mar 27, 2025 •

edited

Loading

jakirkham commented Mar 30, 2025 •

edited

Loading

jakirkham Apr 7, 2025 •

edited

Loading