Skip to content

Commit 701b893

Browse files
authored
Merge pull request #36 from alimanfoo/blosc_upgrade_20160721
upgrade c-blosc to 1.10.0; change default c-blosc compressor to lz4
2 parents d103d5b + 0aeb1b8 commit 701b893

18 files changed

+2069
-643
lines changed

c-blosc

Submodule c-blosc updated 67 files

docs/release.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,21 @@
11
Release notes
22
=============
33

4+
.. _release_1.1.0:
5+
6+
1.1.0
7+
-----
8+
9+
* The bundled Blosc library has been upgraded to version 1.10.0. The 'zstd'
10+
internal compression library is now available within Blosc. See the tutorial
11+
section on :ref:`tutorial_compression` for an example.
12+
* When using the Blosc compressor, the default internal compression library
13+
is now 'lz4'.
14+
* The default number of internal threads for the Blosc compressor has been
15+
increased to a maximum of 8 (previously 4).
16+
* Added convenience functions :func:`zarr.blosc.list_compressors` and
17+
:func:`zarr.blosc.get_nthreads`.
18+
419
.. _release_1.0.0:
520

621
1.0.0

docs/tutorial.rst

Lines changed: 37 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ example::
2121
>>> z = zarr.zeros((10000, 10000), chunks=(1000, 1000), dtype='i4')
2222
>>> z
2323
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
24-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
25-
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
24+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
25+
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
2626
store: builtins.dict
2727

2828
The code above creates a 2-dimensional array of 32-bit integers with
@@ -44,7 +44,7 @@ scalar value::
4444
>>> z[:] = 42
4545
>>> z
4646
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
47-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
47+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
4848
nbytes: 381.5M; nbytes_stored: 2.2M; ratio: 170.4; initialized: 100/100
4949
store: builtins.dict
5050

@@ -92,8 +92,8 @@ enabling persistence of data between sessions. For example::
9292
... chunks=(1000, 1000), dtype='i4', fill_value=0)
9393
>>> z1
9494
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
95-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
96-
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
95+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
96+
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
9797
store: zarr.storage.DirectoryStore
9898

9999
The array above will store its configuration metadata and all
@@ -116,8 +116,8 @@ Check that the data have been written and can be read again::
116116
>>> z2 = zarr.open('example.zarr', mode='r')
117117
>>> z2
118118
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
119-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
120-
nbytes: 381.5M; nbytes_stored: 2.3M; ratio: 163.8; initialized: 100/100
119+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
120+
nbytes: 381.5M; nbytes_stored: 2.3M; ratio: 163.9; initialized: 100/100
121121
store: zarr.storage.DirectoryStore
122122
>>> np.all(z1[:] == z2[:])
123123
True
@@ -135,8 +135,8 @@ can be increased or decreased in length. For example::
135135
>>> z.resize(20000, 10000)
136136
>>> z
137137
zarr.core.Array((20000, 10000), float64, chunks=(1000, 1000), order=C)
138-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
139-
nbytes: 1.5G; nbytes_stored: 5.9M; ratio: 259.9; initialized: 100/200
138+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
139+
nbytes: 1.5G; nbytes_stored: 5.7M; ratio: 268.5; initialized: 100/200
140140
store: builtins.dict
141141

142142
Note that when an array is resized, the underlying data are not
@@ -151,20 +151,20 @@ which can be used to append data to any axis. E.g.::
151151
>>> z = zarr.array(a, chunks=(1000, 100))
152152
>>> z
153153
zarr.core.Array((10000, 1000), int32, chunks=(1000, 100), order=C)
154-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
155-
nbytes: 38.1M; nbytes_stored: 2.0M; ratio: 19.3; initialized: 100/100
154+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
155+
nbytes: 38.1M; nbytes_stored: 1.9M; ratio: 20.0; initialized: 100/100
156156
store: builtins.dict
157157
>>> z.append(a)
158158
>>> z
159159
zarr.core.Array((20000, 1000), int32, chunks=(1000, 100), order=C)
160-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
161-
nbytes: 76.3M; nbytes_stored: 4.0M; ratio: 19.3; initialized: 200/200
160+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
161+
nbytes: 76.3M; nbytes_stored: 3.8M; ratio: 20.0; initialized: 200/200
162162
store: builtins.dict
163163
>>> z.append(np.vstack([a, a]), axis=1)
164164
>>> z
165165
zarr.core.Array((20000, 2000), int32, chunks=(1000, 100), order=C)
166-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
167-
nbytes: 152.6M; nbytes_stored: 7.9M; ratio: 19.3; initialized: 400/400
166+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
167+
nbytes: 152.6M; nbytes_stored: 7.6M; ratio: 20.0; initialized: 400/400
168168
store: builtins.dict
169169

170170
.. _tutorial_compress:
@@ -188,17 +188,24 @@ functions. For example::
188188

189189
>>> z = zarr.array(np.arange(100000000, dtype='i4').reshape(10000, 10000),
190190
... chunks=(1000, 1000), compression='blosc',
191-
... compression_opts=dict(cname='lz4', clevel=3, shuffle=2))
191+
... compression_opts=dict(cname='zstd', clevel=3, shuffle=2))
192192
>>> z
193193
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
194-
compression: blosc; compression_opts: {'clevel': 3, 'cname': 'lz4', 'shuffle': 2}
195-
nbytes: 381.5M; nbytes_stored: 17.6M; ratio: 21.7; initialized: 100/100
194+
compression: blosc; compression_opts: {'clevel': 3, 'cname': 'zstd', 'shuffle': 2}
195+
nbytes: 381.5M; nbytes_stored: 3.1M; ratio: 121.1; initialized: 100/100
196196
store: builtins.dict
197197

198198
The array above will use Blosc as the primary compressor, using the
199-
LZ4 algorithm (compression level 3) internally within Blosc, and with
199+
Zstandard algorithm (compression level 3) internally within Blosc, and with
200200
the bitshuffle filter applied.
201201

202+
A list of the internal compression libraries available within Blosc can be
203+
obtained via::
204+
205+
>>> from zarr import blosc
206+
>>> blosc.list_compressors()
207+
['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib', 'zstd']
208+
202209
In addition to Blosc, other compression libraries can also be
203210
used. Zarr comes with support for zlib, BZ2 and LZMA compression, via
204211
the Python standard library. For example, here is an array using zlib
@@ -270,8 +277,8 @@ array with thread synchronization::
270277
... synchronizer=zarr.ThreadSynchronizer())
271278
>>> z
272279
zarr.sync.SynchronizedArray((10000, 10000), int32, chunks=(1000, 1000), order=C)
273-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
274-
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
280+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
281+
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
275282
store: builtins.dict; synchronizer: zarr.sync.ThreadSynchronizer
276283

277284
This array is safe to read or write within a multi-threaded program.
@@ -285,8 +292,8 @@ provided that all processes have access to a shared file system. E.g.::
285292
... synchronizer=synchronizer)
286293
>>> z
287294
zarr.sync.SynchronizedArray((10000, 10000), int32, chunks=(1000, 1000), order=C)
288-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
289-
nbytes: 381.5M; nbytes_stored: 317; ratio: 1261829.7; initialized: 0/100
295+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
296+
nbytes: 381.5M; nbytes_stored: 313; ratio: 1277955.3; initialized: 0/100
290297
store: zarr.storage.DirectoryStore; synchronizer: zarr.sync.ProcessSynchronizer
291298

292299
This array is safe to read or write from multiple processes.
@@ -350,13 +357,13 @@ data. E.g.::
350357
>>> a = np.arange(100000000, dtype='i4').reshape(10000, 10000).T
351358
>>> zarr.array(a, chunks=(1000, 1000))
352359
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=C)
353-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
354-
nbytes: 381.5M; nbytes_stored: 26.1M; ratio: 14.6; initialized: 100/100
360+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
361+
nbytes: 381.5M; nbytes_stored: 26.3M; ratio: 14.5; initialized: 100/100
355362
store: builtins.dict
356363
>>> zarr.array(a, chunks=(1000, 1000), order='F')
357364
zarr.core.Array((10000, 10000), int32, chunks=(1000, 1000), order=F)
358-
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'blosclz', 'shuffle': 1}
359-
nbytes: 381.5M; nbytes_stored: 10.0M; ratio: 38.0; initialized: 100/100
365+
compression: blosc; compression_opts: {'clevel': 5, 'cname': 'lz4', 'shuffle': 1}
366+
nbytes: 381.5M; nbytes_stored: 9.5M; ratio: 40.1; initialized: 100/100
360367
store: builtins.dict
361368

362369
In the above example, Fortran order gives a better compression ratio. This
@@ -460,12 +467,12 @@ Configuring Blosc
460467

461468
The Blosc compressor is able to use multiple threads internally to
462469
accelerate compression and decompression. By default, Zarr allows
463-
Blosc to use up to 4 internal threads. The number of Blosc threads can
464-
be changed, e.g.::
470+
Blosc to use up to 8 internal threads. The number of Blosc threads can
471+
be changed to increase or decrease this number, e.g.::
465472

466473
>>> from zarr import blosc
467474
>>> blosc.set_nthreads(2)
468-
4
475+
8
469476

470477
When a Zarr array is being used within a multi-threaded program, Zarr
471478
automatically switches to using Blosc in a single-threaded

notebooks/.ipynb_checkpoints/dask_copy-checkpoint.ipynb

Lines changed: 137 additions & 79 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)