-
Notifications
You must be signed in to change notification settings - Fork 97
Add GZip codec #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add GZip codec #87
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
410af66
Add dedicated GZip Codec
funkey 160f7b5
Add tests for GZip codec
funkey d640cc4
Fix Python 2 compat issue w/`zlib.compressobj`
jakirkham 3a80e40
Use `zlib` as the `id` in the Zlib alias test
jakirkham 0e237d3
Merge pull request #1 from jakirkham/tst_fixes_pr_87
funkey 66a1d3e
Drop gzip and zlib alias tests
jakirkham 32b8e30
Include test generated gzip fixture
jakirkham 262352d
Fix flake8 W391 error
jakirkham 666bdfb
Merge pull request #2 from jakirkham/tst_fixes_2_pr_87
funkey 00a0a93
Drop unused imports to fix flake8 errors
jakirkham f950047
Merge pull request #3 from jakirkham/fix_flake8_errs
funkey 5281fc8
Add documentation for GZip codec
funkey 3a80a7d
Add GZip release note
funkey 81a8eba
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham dfbdf93
Rename `dec` to `decompressed` in `GZip.decode`
jakirkham 0269fa6
Use `GzipFile` instance with `BytesIO`
jakirkham 1c9de0f
Update GZip fixtures
jakirkham a62fa59
Merge pull request #4 from jakirkham/use_gzipfile_obj
funkey ee64853
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham 559cd6e
Merge pull request #5 from jakirkham/fix_pr_87_conflicts
funkey eb154f0
Incorporate @alimanfoo's suggestion
jakirkham d658c8e
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham 8706104
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
GZip | ||
==== | ||
.. automodule:: numcodecs.gzip | ||
|
||
.. autoclass:: GZip | ||
|
||
.. autoattribute:: codec_id | ||
.. automethod:: encode | ||
.. automethod:: decode | ||
.. automethod:: get_config | ||
.. automethod:: from_config |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,6 +62,7 @@ Contents | |
lz4 | ||
zstd | ||
zlib | ||
gzip | ||
bz2 | ||
lzma | ||
delta | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"id": "gzip", | ||
"level": 1 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"id": "gzip", | ||
"level": -1 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"id": "gzip", | ||
"level": 0 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"id": "gzip", | ||
"level": 1 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"id": "gzip", | ||
"level": 5 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"id": "gzip", | ||
"level": 9 | ||
} |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# -*- coding: utf-8 -*- | ||
from __future__ import absolute_import, print_function, division | ||
import gzip as _gzip | ||
import io | ||
|
||
|
||
import numpy as np | ||
|
||
|
||
from .abc import Codec | ||
from .compat import buffer_copy, handle_datetime, buffer_tobytes, PY2 | ||
|
||
|
||
class GZip(Codec): | ||
"""Codec providing gzip compression using zlib via the Python standard library. | ||
|
||
Parameters | ||
---------- | ||
level : int | ||
Compression level. | ||
|
||
""" | ||
|
||
codec_id = 'gzip' | ||
|
||
def __init__(self, level=1): | ||
self.level = level | ||
|
||
def encode(self, buf): | ||
|
||
# deal with lack of buffer support for datetime64 and timedelta64 | ||
buf = handle_datetime(buf) | ||
|
||
if isinstance(buf, np.ndarray): | ||
|
||
# cannot compress object array | ||
if buf.dtype == object: | ||
raise ValueError('cannot encode object array') | ||
|
||
# if numpy array, can only handle C contiguous directly | ||
if not buf.flags.c_contiguous: | ||
buf = buf.tobytes(order='A') | ||
|
||
if PY2: # pragma: py3 no cover | ||
# ensure bytes, PY2 cannot handle things like bytearray | ||
buf = buffer_tobytes(buf) | ||
|
||
# do compression | ||
compressed = io.BytesIO() | ||
with _gzip.GzipFile(fileobj=compressed, | ||
mode='wb', | ||
compresslevel=self.level) as compressor: | ||
compressor.write(buf) | ||
compressed = compressed.getvalue() | ||
|
||
return compressed | ||
|
||
# noinspection PyMethodMayBeStatic | ||
def decode(self, buf, out=None): | ||
|
||
if PY2: # pragma: py3 no cover | ||
# ensure bytes, PY2 cannot handle things like bytearray | ||
buf = buffer_tobytes(buf) | ||
|
||
# do decompression | ||
buf = io.BytesIO(buf) | ||
with _gzip.GzipFile(fileobj=buf, mode='rb') as decompressor: | ||
decompressed = decompressor.read() | ||
|
||
# handle destination - Python standard library zlib module does not | ||
# support direct decompression into buffer, so we have to copy into | ||
# out if given | ||
return buffer_copy(decompressed, out) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# -*- coding: utf-8 -*- | ||
from __future__ import absolute_import, print_function, division | ||
import itertools | ||
|
||
|
||
import numpy as np | ||
|
||
|
||
from numcodecs.gzip import GZip | ||
from numcodecs.tests.common import (check_encode_decode, check_config, check_repr, | ||
check_backwards_compatibility, | ||
check_err_decode_object_buffer, | ||
check_err_encode_object_buffer) | ||
|
||
|
||
codecs = [ | ||
GZip(), | ||
GZip(level=-1), | ||
GZip(level=0), | ||
GZip(level=1), | ||
GZip(level=5), | ||
GZip(level=9), | ||
] | ||
|
||
|
||
# mix of dtypes: integer, float, bool, string | ||
# mix of shapes: 1D, 2D, 3D | ||
# mix of orders: C, F | ||
arrays = [ | ||
np.arange(1000, dtype='i4'), | ||
np.linspace(1000, 1001, 1000, dtype='f8'), | ||
np.random.normal(loc=1000, scale=1, size=(100, 10)), | ||
np.random.randint(0, 2, size=1000, dtype=bool).reshape(100, 10, order='F'), | ||
np.random.choice([b'a', b'bb', b'ccc'], size=1000).reshape(10, 10, 10), | ||
np.random.randint(0, 2**60, size=1000, dtype='u8').view('M8[ns]'), | ||
np.random.randint(0, 2**60, size=1000, dtype='u8').view('m8[ns]'), | ||
np.random.randint(0, 2**25, size=1000, dtype='u8').view('M8[m]'), | ||
np.random.randint(0, 2**25, size=1000, dtype='u8').view('m8[m]'), | ||
] | ||
|
||
|
||
def test_encode_decode(): | ||
for arr, codec in itertools.product(arrays, codecs): | ||
check_encode_decode(arr, codec) | ||
|
||
|
||
def test_config(): | ||
codec = GZip(level=3) | ||
check_config(codec) | ||
|
||
|
||
def test_repr(): | ||
check_repr("GZip(level=3)") | ||
|
||
|
||
def test_eq(): | ||
assert GZip() == GZip() | ||
assert not GZip() != GZip() | ||
assert GZip(1) == GZip(1) | ||
assert GZip(1) != GZip(9) | ||
assert GZip() != 'foo' | ||
assert 'foo' != GZip() | ||
assert not GZip() == 'foo' | ||
|
||
|
||
def test_backwards_compatibility(): | ||
check_backwards_compatibility(GZip.codec_id, arrays, codecs) | ||
|
||
|
||
def test_err_decode_object_buffer(): | ||
check_err_decode_object_buffer(GZip()) | ||
|
||
|
||
def test_err_encode_object_buffer(): | ||
check_err_encode_object_buffer(GZip()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change. Just noting for reference. This can be done with
pytest.mark.parametrize
.