Skip to content

ENH: add Pickle/MsgPack codec with support for object ndarrays #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 17, 2016

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Oct 16, 2016

so choosing to raise for non-object ndarray's rather than pickle them. generally other filters will be much more friendly to other dtypes I think.

added support for msgpack-python as another filter, similar to Pickle. this is an optional import.

In [16]: x = np.array(['foo', 'bar', 'baz', np.nan ]*1000000, dtype='object')

In [17]: x.shape
Out[17]: (4000000,)

In [19]: p = codecs.Pickle()

In [20]: m = codecs.MsgPack()

In [21]: %timeit p.decode(p.encode(x))
1 loop, best of 3: 657 ms per loop

In [22]: %timeit m.decode(m.encode(x))
1 loop, best of 3: 577 ms per loop

@jreback
Copy link
Contributor Author

jreback commented Oct 16, 2016

@jreback jreback changed the title ENH: add Pickle codec with support for object ndarrays ENH: add Pickle/MsgPack codec with support for object ndarrays Oct 16, 2016
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jreback, looks good, just a couple of minor comments.



class MsgPack(Codec):
"""Codec to encode data as as msgpacked bytes. Useful for encoding python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as" repeated; missing "." at end of sentence.


""" # flake8: noqa

codec_id = 'x-msgpack'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be just 'msgpack'. I was using 'x-' as a prefix for codecs not provided by numcodecs itself, but now this codec is inside numcodecs we can drop this prefix..


class Pickle(Codec):
"""Codec to encode data as as pickled bytes. Useful for encoding python
strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing '.' at end of sentence.


""" # flake8: noqa

codec_id = 'x-pickle'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be 'pickle'.

if hasattr(buf, 'dtype') and buf.dtype != 'object':
raise ValueError("cannot encode non-object ndarrays, %s "
"dtype was passed" % buf.dtype)
return pickle.dumps(buf, protocol=pickle.HIGHEST_PROTOCOL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add protocol as a configuration parameter of the codec. This would mean adding as a keyword argument to __init__, and also including in the output from get_config().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure that's a good idea. These always must be binary seralized. I don't think their is actually a reason to use a not-highest protocol.

@jreback jreback force-pushed the pickle branch 2 times, most recently from 63a25ea to 3671aea Compare October 17, 2016 10:46
@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2016

ok updated & I added the protocol paramater.

@alimanfoo alimanfoo merged commit 8b6a33a into zarr-developers:master Oct 17, 2016
@alimanfoo
Copy link
Member

Thanks!

@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2016

awesome!

assume you are going to rip the codecs out of zarr and replace with this. soonish?

@alimanfoo
Copy link
Member

Yes that's the plan, next Zarr point release.

On Monday, 17 October 2016, Jeff Reback [email protected] wrote:

awesome!

assume you are going to rip the codecs out of zarr and replace with this.
soonish?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/alimanfoo/numcodecs/pull/5#issuecomment-254314676,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAq8QqXMj7IOVWw7IsgRv3Git-Knt6ZZks5q09JogaJpZM4KYAKG
.

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721

@jreback
Copy link
Contributor Author

jreback commented Oct 17, 2016

didn't realize you had a doc folder, you need updates?

@alimanfoo
Copy link
Member

No worries, that would be nice if you have time.

On Monday, October 17, 2016, Jeff Reback [email protected] wrote:

didn't realize you had a doc folder, you need updates?


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
https://github.com/alimanfoo/numcodecs/pull/5#issuecomment-254355724,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAq8Qss8lySUdh2K1vUKtGG0vVeeVoe5ks5q0_lPgaJpZM4KYAKG
.

Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health http://cggh.org
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721

@alimanfoo alimanfoo modified the milestone: 0.1 Feb 23, 2017
jakirkham pushed a commit that referenced this pull request Nov 6, 2018
Fix conflicts with upstream `master`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants