Skip to content

Array indexing with Zarr 3 is noticeably slower than with Zarr 2 #3524

@b8raoult

Description

@b8raoult

Zarr version

3.1.3

Numcodecs version

0.15.1

Python Version

3.12.9

Operating System

Linux

Installation

uv pip install

Description

Timing comparisons between Zarr 2 and Zarr 3 of various indexing show that version 3 is always slower than version 2. I have attached the code to run the comparisons, and some results. All tests are run in memory and the compressors are disabled.

You can see that for example that accessing data[::step] is much slower (from 0.5s to 3.9s) . It may be due to the overhead of switching between synchronous and asynchronous code, but this is just a guess.

Steps to reproduce

import zarr
import numpy as np
import timeit
import inspect

version = int(zarr.__version__.split(".")[0])

values = np.ones(shape=(60632, 4, 1, 10))

root = zarr.group()

if version < 3:
    data = root.create_dataset("data", data=values, shape=values.shape, compressor=None)
else:
    data = root.create_array("data", data=values, compressors=None)


start = 15
end = data.shape[0] - 15
step = data.shape[0] // 10


def set_values():
    data[:] = 2


tests = [
    lambda: data[0:10, :, 0],
    lambda: data[:, 0:3, 0],
    lambda: data[0:10, 0:3, 0],
    lambda: data[:, :, :],
    lambda: data[0],
    lambda: data[0, :],
    lambda: data[0, 0, :],
    lambda: data[0, 0, 0, :],
    lambda: data[start:end:step],
    lambda: data[start:end],
    lambda: data[start:],
    lambda: data[:end],
    lambda: data[::step],
    lambda: set_values(),
]

for i, t in enumerate(tests):
    src = inspect.getsourcelines(t)[0][0].strip().replace("lambda:", "").strip(",")

    elapsed = timeit.timeit(t, number=1000)
    print(f"zarr{version}: {src:22}: {elapsed:10.4f} seconds")

Additional output

Running the test above with Zarr 2.18.7:

zarr2:  data[0:10, :, 0]     :     0.1546 seconds
zarr2:  data[:, 0:3, 0]      :     4.1608 seconds
zarr2:  data[0:10, 0:3, 0]   :     0.1267 seconds
zarr2:  data[:, :, :]        :     6.2516 seconds
zarr2:  data[0]              :     0.1478 seconds
zarr2:  data[0, :]           :     0.1492 seconds
zarr2:  data[0, 0, :]        :     0.0602 seconds
zarr2:  data[0, 0, 0, :]     :     0.0676 seconds
zarr2:  data[start:end:step] :     0.5197 seconds
zarr2:  data[start:end]      :     6.2125 seconds
zarr2:  data[start:]         :     6.2291 seconds
zarr2:  data[:end]           :     6.2185 seconds
zarr2:  data[::step]         :     0.5151 seconds
zarr2:  set_values()         :     3.2621 seconds

and with Zarr 3.1.3:

zarr3:  data[0:10, :, 0]     :     1.1350 seconds
zarr3:  data[:, 0:3, 0]      :     7.1892 seconds
zarr3:  data[0:10, 0:3, 0]   :     0.9352 seconds
zarr3:  data[:, :, :]        :    10.4327 seconds
zarr3:  data[0]              :     1.1225 seconds
zarr3:  data[0, :]           :     1.1331 seconds
zarr3:  data[0, 0, :]        :     0.5101 seconds
zarr3:  data[0, 0, 0, :]     :     0.5125 seconds
zarr3:  data[start:end:step] :     3.8887 seconds
zarr3:  data[start:end]      :    10.4090 seconds
zarr3:  data[start:]         :    10.4113 seconds
zarr3:  data[:end]           :    10.4148 seconds
zarr3:  data[::step]         :     3.9100 seconds
zarr3:  set_values()         :    17.0404 seconds

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions