Add "no compressor" as a compressor #58

dstansby · 2025-05-13T10:05:39Z

No description provided.

K-Meech · 2025-05-15T13:07:20Z

I've added reading/writing a zarr array with no compressor on my current branch, but I'm seeing some odd values for the compression ratio.

For example, running the script below with zarr python v3, gives a compression ratio less than 1! (0.5 compression ratio) The same happens for zarr python v2.

I think the issue here is that our dev image has a shape of 100 x 100 x 100 which doesn't fit exactly into chunks of 64 x 64 x 64. nbytes seems to be using the 100x100x100 shape to calculate i.e. (100 * 100 * 100 * 64)/8 = 8000000, but I guess the real shape of the stored array is 128 x 128 x 128 (as it's two 64 chunks wide) i.e. (128 * 128* 128* 64)/8 = 16777216. This is much closer to the value nbytes_stored gives. @dstansby - is this maybe a bug in zarr-python?

import zarr
import numpy as np
import pathlib
import zarr

image = np.random.rand(100, 100, 100)
store_path = pathlib.Path("tests/tmp/data")

zarr_array = zarr.create_array(
    store=store_path,
    shape=image.shape,
    chunks=(64, 64, 64),
    dtype=image.dtype,
    compressors=None,
    zarr_format=2,
    fill_value=0,
    config={"write_empty_chunks": True},
)
zarr_array[:] = image

nbytes = zarr_array.nbytes 
nbytes_stored = zarr_array.nbytes_stored()
compression_ratio = nbytes / nbytes_stored

print(zarr_array.info_complete())
print("array shape:", zarr_array.shape)
print("array dtype:", zarr_array.dtype)
print("nbytes:", nbytes)
print("nbytes stored:", nbytes_stored)
print("compression ratio:", compression_ratio)

K-Meech · 2025-05-15T13:17:22Z

I guess it depends on how we're defining compression ratio - if it's the (size of the array if image stored at current size with no compression / size of array as zarr array), then I guess it is possible to have a compression ratio less than 1. In this case, the uncompressed zarr array is really taking up more space than a 100x100x100 array would.

dstansby · 2025-05-15T13:23:27Z

It's certainly a known issue for zarr-python 2: zarr-developers/zarr-python#2174. If it's an issue for zarr-python 3 I'm not sure if that's a known issue, so a reproducible example and issue in zarr-python would be very welcome!

I thought to get around this we were directly measuring the size of the array as the size of the folder that it's written to, instead of relying on nbytes_stored?

K-Meech · 2025-05-15T13:39:08Z

We're only using that for tensorstore at the moment - zarr-python benchmarks use : compression_ratio = zarr_array.nbytes / zarr_array.nbytes_stored

Even so, Tensorstore is still reporting a compression ratio below 1 (0.5, same as zarr-python). It seems that nbytes_stored isn't the issue here, but more nbytes (as it's using the shape of the image put in 100x100x100 rather than the shape of the final zarr array 128x128x128)?

K-Meech · 2025-05-15T13:50:20Z

Something like nbytes = (zarr_array.nchunks * zarr_array.chunks[0] * zarr_array.chunks[1] * zarr_array.chunks[2] * zarr_array.dtype.itemsize) seems to work vs zarr_array.size * zarr_array.dtype.itemsize

dstansby · 2025-05-15T13:55:00Z

Ah right, I think that makes sense. Because every chunk has to be full, you always store the first multiple of the chunk size above your actual data size. So for 100 x 100 x 100 it makes sense that not compressing ends up with more bytes on disk, so the compression ratio is < 1.

Since OME-Zarr is used for big data, this effect is less relevant (an extra 100 array elements on the end of say ~2000 array items is a much smaller fraction). So I would say we just update the dev image to be 128 x 128 x 128 exactly, so it doesn't end up with a potentially confusing < 1 compression ratio. Thoughts?

K-Meech · 2025-05-15T15:03:34Z

Sounds reasonable to me! I'll update the size to 128x128x128 as part of my PR.

dstansby added the priority: medium label May 13, 2025

ruaridhg assigned K-Meech May 13, 2025

K-Meech mentioned this issue May 16, 2025

Add benchmarks runs with no compressor #67

Merged

K-Meech closed this as completed in #67 May 28, 2025

K-Meech mentioned this issue May 29, 2025

Improve appearance of 'no compressor' in full-size image plots #70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add "no compressor" as a compressor #58

Add "no compressor" as a compressor #58

dstansby commented May 13, 2025

K-Meech commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!

dstansby commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!

dstansby commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!

Add "no compressor" as a compressor #58

Add "no compressor" as a compressor #58

Comments

dstansby commented May 13, 2025

K-Meech commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!

dstansby commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!

dstansby commented May 15, 2025

Uh oh!

K-Meech commented May 15, 2025

Uh oh!