Use more buffers (redux) #128

alimanfoo · 2018-11-22T22:44:19Z

Based on work in #121, trying to make a clean split between PY2 and PY3 code paths, and make function naming more intuitive.

TODO:

Unit tests and/or doctests in docstrings
tox -e py37 passes locally
tox -e py27 passes locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
tox -e docs passes locally
AppVeyor and Travis CI passes
Test coverage to 100% (Coveralls passes)

jakirkham

Thanks for playing around with this problem as well @alimanfoo. Seems you are encountering the same subtleties I ran into. Left some comments below, but wouldn't take them too seriously. They are more intended as advice. Completely understand this is still a work in progress at this stage.

Would just highlight one point that comes up below and have generally learned by playing with this problem. Namely NumPy ndarrays end up being a great solution to use for all data in codecs. They support both buffer protocols, they support more types out-of-the-box, and they are easier for us and new users to the codebase to work with. This saves on compat code, work the same on Python 2/3, are easily handled by builtin compressors, and make things like casting, reshaping, and getting bytes trivial. In short, a lot of mileage can be gotten out of ndarrays.

numcodecs/compat.py

alimanfoo · 2018-11-23T00:44:22Z

Thanks @jakirkham for the comments, this is a tricky one to get right.

FWIW I think the key issue here is that many codecs require an input that, under Python 3, exposes a new-style buffer interface onto a C contiguous block of memory. E.g., things like ZLib, BZ2 and LZMA.

A blocker from just passing a numpy array to these functions is the datetime & timedelta datatypes which you can't take a memoryview of.

If I included a conversion from datetime or timedelta to int64 within the ensure_contiguous_ndarray() function, then it would be possible to use that function everywhere instead of either ensure_memoryview() or ensure_buffer(), and so delete those two functions. Maybe that would be a good idea.

jakirkham · 2018-11-23T00:47:39Z

If I included a conversion from datetime or timedelta to int64 within the ensure_contiguous_ndarray() function, then it would be possible to use that function everywhere instead of either ensure_memoryview() or ensure_buffer(), and so delete those two functions. Maybe that would be a good idea.

This is my temptation as well. Let's go for it. 😄

alimanfoo · 2018-11-23T01:24:36Z

OK, 'tis done! I'm using ensure_contiguous_ndarray() everywhere now. ensure_memoryview() and ensure_buffer() are gone.

There were still some quirks in the BZ2 and GZip codecs that required an explicit conversion to memoryview, but I figure that's OK.

Let me know what you think.

numcodecs/categorize.py

numcodecs/checksum32.py

jakirkham · 2018-11-23T21:00:02Z

OK, 'tis done! I'm using ensure_contiguous_ndarray() everywhere now. ensure_memoryview() and ensure_buffer() are gone.

Beautiful! Thanks for doing that. 😄

There were still some quirks in the BZ2 and GZip codecs that required an explicit conversion to memoryview, but I figure that's OK.

Yeah, I ran into these two as well. Agree this seems fine.

Let me know what you think.

Added a few more comments above in our existing threads to keep continuity with the existing discussion. A couple other comments above as well. For the most part these are moving to minor points. IOW this is looking pretty good.

numcodecs/compat.py

…atibility tests This reverts commit 047ac0a.

alimanfoo · 2018-11-27T02:03:43Z

Hi @jakirkham, after some thrashing around I think I have basically converged on something very similar to what you have in #121. There are some small differences which I'm more than happy to discuss. I've also pushed on and refactored the buffer compatibility code in the cython modules, to re-use the new compat functions. I still have to address the point you raised about dealing with unicode arrays, but apart from that, I'd be very grateful if you could take a look and see if there's anything you think I've missed, or anything you think should be done differently.

alimanfoo · 2018-11-27T09:27:28Z

Latest commit disallows unicode array.array, seemed like it was not worth the effort to support it.

alimanfoo · 2018-11-27T11:44:20Z

In the interests of getting to a 0.6 release asap, I'd like to suggest we move forward with this PR instead of #121, mainly for the reason that I've had the chance to add in a bit more documentation and comments, so hopefully when we or someone else needs to revisit this in future, they should be able to comprehend what is being done and why. I believe that the approaches taken here and in #121 are essentially the same, with the main differences being in function naming and some minor implementation details. Please let me know if any objections. Whatever PR we take forward, primary credit for this work should go to @jakirkham.

numcodecs/compat.py

numcodecs/gzip.py

jakirkham · 2018-11-27T18:40:23Z

Had some pretty minor comments above. Think this is basically done at this stage.

In the interests of getting to a 0.6 release asap, I'd like to suggest we move forward with this PR instead of #121...

Sounds good to me.

...mainly for the reason that I've had the chance to add in a bit more documentation and comments, so hopefully when we or someone else needs to revisit this in future, they should be able to comprehend what is being done and why.

I think the test coverage and function naming were also better here. The API has also improved generally.

Whatever PR we take forward, primary credit for this work should go to @jakirkham.

Thanks for working on this as well. It has come a long way since that PR.

alimanfoo · 2018-11-27T22:25:47Z

Thanks again @jakirkham for the review. Your suggestions are all good and I've gone with them in latest commits. Thanks also for bottoming out some really gnarly Python quirks, it is amazing how deep the rabbit hole goes!

If CI passes I propose to merge if no objections.

jakirkham · 2018-11-27T23:21:37Z

Thanks @alimanfoo! 🎉

Agree this looks great. Thanks for your hard work here. 😄

alimanfoo force-pushed the use-buffers-redux-alimanfoo-20181122 branch from ceb7ccc to 0153531 Compare November 22, 2018 23:39

jakirkham reviewed Nov 22, 2018

View reviewed changes

alimanfoo changed the title ~~WIP Use more buffers (redux)~~ Use more buffers (redux) Nov 23, 2018

alimanfoo mentioned this pull request Nov 23, 2018

Cast datetime and timedelta to signed 64-bit int #127

Merged

8 tasks

alimanfoo force-pushed the use-buffers-redux-alimanfoo-20181122 branch from 047ac0a to f4f6f5d Compare November 23, 2018 12:17

alimanfoo mentioned this pull request Nov 23, 2018

Fixing blosc encode error handling #81

Merged

8 tasks

jakirkham reviewed Nov 23, 2018

View reviewed changes

numcodecs/categorize.py Outdated Show resolved Hide resolved

jakirkham reviewed Nov 23, 2018

View reviewed changes

numcodecs/checksum32.py Outdated Show resolved Hide resolved

jakirkham reviewed Nov 27, 2018

View reviewed changes

numcodecs/compat.py Outdated Show resolved Hide resolved

alimanfoo added 17 commits November 27, 2018 01:53

rework memory access and buffer compatibility

3bcf37b

refactor utility functions

bd257a3

tidy up

2557463

fix tests

4dd5db8

fix coverage pragma

a3cce43

tidy naming

0428d0f

simplify to push everything through ndarray

ace1992

fix tests and coverage

0d22c64

simplify

fb9f487

Revert "simplify" - need to allow for object arrays in backwards comp…

cf386f9

…atibility tests This reverts commit 047ac0a.

improve comments

f39cae9

rework compat functions

51b17fc

back off some complexity

ea7a119

fix ndarray type and size on PY2

dda92c3

further reworking, including pyx buffer compat

8f07f28

test and coverage fixes

143382f

syntax error

9c760df

syntax error

5084c56

alimanfoo force-pushed the use-buffers-redux-alimanfoo-20181122 branch from 1acb560 to 5084c56 Compare November 27, 2018 01:53

alimanfoo mentioned this pull request Nov 27, 2018

Refactor and use more buffer view coercion #121

Closed

8 tasks

docstrings and comments; unicode array.array disallowed

492c543

jakirkham reviewed Nov 27, 2018

View reviewed changes

numcodecs/compat.py Outdated Show resolved Hide resolved

jakirkham reviewed Nov 27, 2018

View reviewed changes

numcodecs/compat.py Outdated Show resolved Hide resolved

jakirkham reviewed Nov 27, 2018

View reviewed changes

numcodecs/compat.py Show resolved Hide resolved

jakirkham reviewed Nov 27, 2018

View reviewed changes

numcodecs/gzip.py Outdated Show resolved Hide resolved

jakirkham added this to the 0.6.0 milestone Nov 27, 2018

alimanfoo added 3 commits November 27, 2018 22:14

address @jakirkham comments

d9f21fe

recythonize with py37 this time (oops)

dca5173

fix pragma

abeccb2

release notes [ci skip]

5938937

alimanfoo merged commit df60e2f into zarr-developers:master Nov 27, 2018

alimanfoo deleted the use-buffers-redux-alimanfoo-20181122 branch November 27, 2018 23:16

jakirkham mentioned this pull request Nov 27, 2018

Add link about Python 2 GZip workaround #133

Merged

8 tasks

jakirkham mentioned this pull request Nov 28, 2018

New codec contribution guide #125

Closed

alimanfoo mentioned this pull request Nov 29, 2018

Error during generation of fixture data #138

Closed

jakirkham mentioned this pull request Jan 4, 2019

RFC: Optionally support memory-mapping DirectoryStore values zarr-developers/zarr-python#377

Closed

7 tasks

This was referenced Nov 12, 2019

Use ensure_ndarray in a few more places zarr-developers/zarr-python#506

Merged

WIP: Allow CuPy #212

Closed

jakirkham mentioned this pull request Jun 16, 2020

Add blosc getitem #235

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use more buffers (redux) #128

Use more buffers (redux) #128

alimanfoo commented Nov 22, 2018 •

edited

Loading

jakirkham left a comment •

edited

Loading

alimanfoo commented Nov 23, 2018

jakirkham commented Nov 23, 2018

alimanfoo commented Nov 23, 2018

jakirkham commented Nov 23, 2018

alimanfoo commented Nov 27, 2018

alimanfoo commented Nov 27, 2018

alimanfoo commented Nov 27, 2018

jakirkham commented Nov 27, 2018

alimanfoo commented Nov 27, 2018

jakirkham commented Nov 27, 2018

Use more buffers (redux) #128

Use more buffers (redux) #128

Conversation

alimanfoo commented Nov 22, 2018 • edited Loading

jakirkham left a comment • edited Loading

Choose a reason for hiding this comment

alimanfoo commented Nov 23, 2018

jakirkham commented Nov 23, 2018

alimanfoo commented Nov 23, 2018

jakirkham commented Nov 23, 2018

alimanfoo commented Nov 27, 2018

alimanfoo commented Nov 27, 2018

alimanfoo commented Nov 27, 2018

jakirkham commented Nov 27, 2018

alimanfoo commented Nov 27, 2018

jakirkham commented Nov 27, 2018

alimanfoo commented Nov 22, 2018 •

edited

Loading

jakirkham left a comment •

edited

Loading