-
Notifications
You must be signed in to change notification settings - Fork 97
Handle (new) buffer protocol conforming types in Pickle.decode
#143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
On Python 3, `pickle.loads` is able to work with anything that supports the (new) buffer protocol. Unfortunately Python 2's, `cPickle.loads` is not so flexible and requires a `bytes` object specifically. So this ensures that other objects are coerced to `bytes` objects specifically on Python 2.
On Python 3, `pickle.loads` expects an object that implements the buffer protocol and is contiguous. While it would already raise this error, it helpful for us to check this first. Plus we should be able to handle Fortran order buffers should that be needed. On Python 2, the more strict constraint of a `bytes` object is required, which is still enforced.
This ensures that Pickle can decode anything that supports the (new) buffer protocol.
if PY2: | ||
buf = ensure_bytes(buf) | ||
else: | ||
buf = ensure_contiguous_ndarray(buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to provide context from the Python codebase, Python 2 requires bytes
here and Python 3 uses the (new) buffer protocol to get a C-contiguous array.
@@ -147,6 +147,10 @@ def check_encode_decode_array(arr, codec): | |||
codec.decode(enc, out=out) | |||
assert_array_items_equal(arr, out) | |||
|
|||
enc = codec.encode(arr) | |||
dec = codec.decode(ensure_ndarray(enc)) | |||
assert_array_items_equal(arr, dec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this worked for everything except Pickle
on Python 2 as demonstrated by this CI failure. This is likely a consequence of the improved buffer handling under the hood, which MsgPack
, JSON
, etc. all use already. Pickle
was the only one not using these functions, which this PR fixes. So testing this generally seems to be fine. All the other codecs already test decoding with ndarray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
Thanks for taking a look. |
Python 3's
pickle.loads
is able to handle anything that conforms to the (new) buffer protocol as long as it is C-contiguous. We go ahead and enforce this on all inputs; raising if that is not possible. On Python 2cPickle.loads
is not as flexible as it requires abytes
object explicitly. This enforces that case as well.TODO:
tox -e py37
passes locallytox -e py27
passes locallytox -e docs
passes locally