Skip to content

Add GZip codec #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 6, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
410af66
Add dedicated GZip Codec
funkey Oct 16, 2018
160f7b5
Add tests for GZip codec
funkey Oct 18, 2018
d640cc4
Fix Python 2 compat issue w/`zlib.compressobj`
jakirkham Oct 19, 2018
3a80e40
Use `zlib` as the `id` in the Zlib alias test
jakirkham Oct 19, 2018
0e237d3
Merge pull request #1 from jakirkham/tst_fixes_pr_87
funkey Oct 19, 2018
66a1d3e
Drop gzip and zlib alias tests
jakirkham Oct 19, 2018
32b8e30
Include test generated gzip fixture
jakirkham Oct 19, 2018
262352d
Fix flake8 W391 error
jakirkham Oct 19, 2018
666bdfb
Merge pull request #2 from jakirkham/tst_fixes_2_pr_87
funkey Oct 20, 2018
00a0a93
Drop unused imports to fix flake8 errors
jakirkham Oct 20, 2018
f950047
Merge pull request #3 from jakirkham/fix_flake8_errs
funkey Oct 20, 2018
5281fc8
Add documentation for GZip codec
funkey Oct 20, 2018
3a80a7d
Add GZip release note
funkey Oct 20, 2018
81a8eba
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham Oct 30, 2018
dfbdf93
Rename `dec` to `decompressed` in `GZip.decode`
jakirkham Oct 30, 2018
0269fa6
Use `GzipFile` instance with `BytesIO`
jakirkham Oct 30, 2018
1c9de0f
Update GZip fixtures
jakirkham Oct 30, 2018
a62fa59
Merge pull request #4 from jakirkham/use_gzipfile_obj
funkey Oct 31, 2018
ee64853
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham Nov 2, 2018
559cd6e
Merge pull request #5 from jakirkham/fix_pr_87_conflicts
funkey Nov 2, 2018
eb154f0
Incorporate @alimanfoo's suggestion
jakirkham Nov 6, 2018
d658c8e
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham Nov 6, 2018
8706104
Merge 'zarr-developers/master' into 'funkey/master'
jakirkham Nov 6, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/gzip.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
GZip
====
.. automodule:: numcodecs.gzip

.. autoclass:: GZip

.. autoattribute:: codec_id
.. automethod:: encode
.. automethod:: decode
.. automethod:: get_config
.. automethod:: from_config
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Contents
lz4
zstd
zlib
gzip
bz2
lzma
delta
Expand Down
2 changes: 2 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ Release notes

* Add Python 3.7 (by :user:`John Kirkham <jakirkham>`; :issue:`92`).

* Add codec :class:`numcodecs.gzip.GZip` to replace ``gzip`` alias for ``zlib``,
which was incorrect (by :user:`Jan Funke <funkey>`; :issue:`87`).

.. _release_0.5.5:

Expand Down
Binary file added fixture/gzip/array.00.npy
Binary file not shown.
Binary file added fixture/gzip/array.01.npy
Binary file not shown.
Binary file added fixture/gzip/array.02.npy
Binary file not shown.
Binary file added fixture/gzip/array.03.npy
Binary file not shown.
Binary file added fixture/gzip/array.04.npy
Binary file not shown.
Binary file added fixture/gzip/array.05.npy
Binary file not shown.
Binary file added fixture/gzip/array.06.npy
Binary file not shown.
Binary file added fixture/gzip/array.07.npy
Binary file not shown.
Binary file added fixture/gzip/array.08.npy
Binary file not shown.
4 changes: 4 additions & 0 deletions fixture/gzip/codec.00/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"id": "gzip",
"level": 1
}
Binary file added fixture/gzip/codec.00/encoded.00.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.01.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.02.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.03.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.04.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.05.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.06.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.07.dat
Binary file not shown.
Binary file added fixture/gzip/codec.00/encoded.08.dat
Binary file not shown.
4 changes: 4 additions & 0 deletions fixture/gzip/codec.01/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"id": "gzip",
"level": -1
}
Binary file added fixture/gzip/codec.01/encoded.00.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.01.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.02.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.03.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.04.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.05.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.06.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.07.dat
Binary file not shown.
Binary file added fixture/gzip/codec.01/encoded.08.dat
Binary file not shown.
4 changes: 4 additions & 0 deletions fixture/gzip/codec.02/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"id": "gzip",
"level": 0
}
Binary file added fixture/gzip/codec.02/encoded.00.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.01.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.02.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.03.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.04.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.05.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.06.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.07.dat
Binary file not shown.
Binary file added fixture/gzip/codec.02/encoded.08.dat
Binary file not shown.
4 changes: 4 additions & 0 deletions fixture/gzip/codec.03/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"id": "gzip",
"level": 1
}
Binary file added fixture/gzip/codec.03/encoded.00.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.01.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.02.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.03.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.04.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.05.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.06.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.07.dat
Binary file not shown.
Binary file added fixture/gzip/codec.03/encoded.08.dat
Binary file not shown.
4 changes: 4 additions & 0 deletions fixture/gzip/codec.04/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"id": "gzip",
"level": 5
}
Binary file added fixture/gzip/codec.04/encoded.00.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.01.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.02.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.03.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.04.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.05.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.06.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.07.dat
Binary file not shown.
Binary file added fixture/gzip/codec.04/encoded.08.dat
Binary file not shown.
4 changes: 4 additions & 0 deletions fixture/gzip/codec.05/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"id": "gzip",
"level": 9
}
Binary file added fixture/gzip/codec.05/encoded.00.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.01.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.02.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.03.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.04.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.05.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.06.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.07.dat
Binary file not shown.
Binary file added fixture/gzip/codec.05/encoded.08.dat
Binary file not shown.
4 changes: 3 additions & 1 deletion numcodecs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,9 @@

from numcodecs.zlib import Zlib
register_codec(Zlib)
register_codec(Zlib, 'gzip') # alias

from numcodecs.gzip import GZip
register_codec(GZip)

from numcodecs.bz2 import BZ2
register_codec(BZ2)
Expand Down
73 changes: 73 additions & 0 deletions numcodecs/gzip.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function, division
import gzip as _gzip
import io


import numpy as np


from .abc import Codec
from .compat import buffer_copy, handle_datetime, buffer_tobytes, PY2


class GZip(Codec):
"""Codec providing gzip compression using zlib via the Python standard library.

Parameters
----------
level : int
Compression level.

"""

codec_id = 'gzip'

def __init__(self, level=1):
self.level = level

def encode(self, buf):

# deal with lack of buffer support for datetime64 and timedelta64
buf = handle_datetime(buf)

if isinstance(buf, np.ndarray):

# cannot compress object array
if buf.dtype == object:
raise ValueError('cannot encode object array')

# if numpy array, can only handle C contiguous directly
if not buf.flags.c_contiguous:
buf = buf.tobytes(order='A')

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)

# do compression
compressed = io.BytesIO()
with _gzip.GzipFile(fileobj=compressed,
mode='wb',
compresslevel=self.level) as compressor:
compressor.write(buf)
compressed = compressed.getvalue()

return compressed

# noinspection PyMethodMayBeStatic
def decode(self, buf, out=None):

if PY2: # pragma: py3 no cover
# ensure bytes, PY2 cannot handle things like bytearray
buf = buffer_tobytes(buf)

# do decompression
buf = io.BytesIO(buf)
with _gzip.GzipFile(fileobj=buf, mode='rb') as decompressor:
decompressed = decompressor.read()

# handle destination - Python standard library zlib module does not
# support direct decompression into buffer, so we have to copy into
# out if given
return buffer_copy(decompressed, out)
75 changes: 75 additions & 0 deletions numcodecs/tests/test_gzip.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function, division
import itertools


import numpy as np


from numcodecs.gzip import GZip
from numcodecs.tests.common import (check_encode_decode, check_config, check_repr,
check_backwards_compatibility,
check_err_decode_object_buffer,
check_err_encode_object_buffer)


codecs = [
GZip(),
GZip(level=-1),
GZip(level=0),
GZip(level=1),
GZip(level=5),
GZip(level=9),
]


# mix of dtypes: integer, float, bool, string
# mix of shapes: 1D, 2D, 3D
# mix of orders: C, F
arrays = [
np.arange(1000, dtype='i4'),
np.linspace(1000, 1001, 1000, dtype='f8'),
np.random.normal(loc=1000, scale=1, size=(100, 10)),
np.random.randint(0, 2, size=1000, dtype=bool).reshape(100, 10, order='F'),
np.random.choice([b'a', b'bb', b'ccc'], size=1000).reshape(10, 10, 10),
np.random.randint(0, 2**60, size=1000, dtype='u8').view('M8[ns]'),
np.random.randint(0, 2**60, size=1000, dtype='u8').view('m8[ns]'),
np.random.randint(0, 2**25, size=1000, dtype='u8').view('M8[m]'),
np.random.randint(0, 2**25, size=1000, dtype='u8').view('m8[m]'),
]


def test_encode_decode():
for arr, codec in itertools.product(arrays, codecs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change. Just noting for reference. This can be done with pytest.mark.parametrize.

check_encode_decode(arr, codec)


def test_config():
codec = GZip(level=3)
check_config(codec)


def test_repr():
check_repr("GZip(level=3)")


def test_eq():
assert GZip() == GZip()
assert not GZip() != GZip()
assert GZip(1) == GZip(1)
assert GZip(1) != GZip(9)
assert GZip() != 'foo'
assert 'foo' != GZip()
assert not GZip() == 'foo'


def test_backwards_compatibility():
check_backwards_compatibility(GZip.codec_id, arrays, codecs)


def test_err_decode_object_buffer():
check_err_decode_object_buffer(GZip())


def test_err_encode_object_buffer():
check_err_encode_object_buffer(GZip())
7 changes: 0 additions & 7 deletions numcodecs/tests/test_zlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@


from numcodecs.zlib import Zlib
from numcodecs.registry import get_codec
from numcodecs.tests.common import (check_encode_decode, check_config, check_repr,
check_backwards_compatibility,
check_err_decode_object_buffer,
Expand Down Expand Up @@ -54,12 +53,6 @@ def test_repr():
check_repr("Zlib(level=3)")


def test_alias():
config = dict(id='gzip', level=1)
codec = get_codec(config)
assert Zlib(1) == codec


def test_eq():
assert Zlib() == Zlib()
assert not Zlib() != Zlib()
Expand Down