Skip to content

Handle (new) buffer protocol conforming types in Pickle.decode #143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 30, 2018
Merged

Handle (new) buffer protocol conforming types in Pickle.decode #143

merged 6 commits into from
Nov 30, 2018

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Nov 30, 2018

Python 3's pickle.loads is able to handle anything that conforms to the (new) buffer protocol as long as it is C-contiguous. We go ahead and enforce this on all inputs; raising if that is not possible. On Python 2 cPickle.loads is not as flexible as it requires a bytes object explicitly. This enforces that case as well.

TODO:

  • Unit tests and/or doctests in docstrings
  • tox -e py37 passes locally
  • tox -e py27 passes locally
  • Docstrings and API docs for any new/modified user-facing classes and functions
  • Changes documented in docs/release.rst
  • tox -e docs passes locally
  • AppVeyor and Travis CI passes
  • Test coverage to 100% (Coveralls passes)

On Python 3, `pickle.loads` is able to work with anything that supports
the (new) buffer protocol. Unfortunately Python 2's, `cPickle.loads` is
not so flexible and requires a `bytes` object specifically. So this
ensures that other objects are coerced to `bytes` objects specifically
on Python 2.
On Python 3, `pickle.loads` expects an object that implements the buffer
protocol and is contiguous. While it would already raise this error, it
helpful for us to check this first. Plus we should be able to handle
Fortran order buffers should that be needed. On Python 2, the more
strict constraint of a `bytes` object is required, which is still
enforced.
This ensures that Pickle can decode anything that supports the (new)
buffer protocol.
if PY2:
buf = ensure_bytes(buf)
else:
buf = ensure_contiguous_ndarray(buf)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -147,6 +147,10 @@ def check_encode_decode_array(arr, codec):
codec.decode(enc, out=out)
assert_array_items_equal(arr, out)

enc = codec.encode(arr)
dec = codec.decode(ensure_ndarray(enc))
assert_array_items_equal(arr, dec)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this worked for everything except Pickle on Python 2 as demonstrated by this CI failure. This is likely a consequence of the improved buffer handling under the hood, which MsgPack, JSON, etc. all use already. Pickle was the only one not using these functions, which this PR fixes. So testing this generally seems to be fine. All the other codecs already test decoding with ndarray.

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@jakirkham
Copy link
Member Author

Thanks for taking a look.

@jakirkham jakirkham merged commit ac2d822 into zarr-developers:master Nov 30, 2018
@jakirkham jakirkham deleted the fix_pickle_py2 branch November 30, 2018 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants