Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Inconsistent reading performance with multiple cpu threads #2084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FelipeMoser opened this issue Aug 13, 2024 · 0 comments
Closed

Inconsistent reading performance with multiple cpu threads #2084

FelipeMoser opened this issue Aug 13, 2024 · 0 comments
Labels
bug Potential issues with the zarr-python library V2 Affects the v2 branch

Comments

@FelipeMoser
Copy link

FelipeMoser commented Aug 13, 2024

Zarr version

2.18.2

Numcodecs version

0.13.0

Python Version

3.12.4

Operating System

Linux

Installation

pip install zarr

Description

I've converted some ome.tiff files to .zarr and have had issues with the reading time of the zarr files.
For this example I'm using an image of shape [ 4, 16484, 11620 ], and I have stored it with chunk size (1,1024, 1024) as well as unchunked.
All files are stored in a RAID0 nvme ssd and have the same compression.

I've compared the reading times using 1, 10, and 50 logical threads (with taskset) and noticed the performance can vary greatly depending on the settings. If unchunked, additional threads significantly improves reading time, just like when reading ome.tiffs. In fact, reading unchunked files is significantly faster than ome.tiffs. However, chunked files do not seem to benefit from additional threads, even resulting in slower times. Reading with the dask library also seems to have inconsistent performance, although in a different way.

Additionally, considering the hardware (RAID0 nvme ssd, dual 56 core CPU Intel Xeon Platinum 8280) , I'd assume that reading chunked files with multiple workers would be much faster, as the processing is done in parallel. But here we see that not only does it not seem to benefit from more workers, but it's an order of magnitude slower than reading an unchunked file.

Is there something I could be missing here?

Steps to reproduce

This is the code I'm using:

start = time.time()
z = zarr.open(path_zarr)[:]
print(f"Zarr read time (chunked): {time.time() - start}")

start = time.time()
z_nochunk = zarr.open(path_zarr_nochunk)[:]
print(f"Zarr read time (no chunks): {time.time() - start}")

start = time.time()
d = dask.array.from_zarr(path_zarr).compute()
print(f"Dask read time (chunked): {time.time() - start}")

start = time.time()
d_nochunk = dask.array.from_zarr(path_zarr_nochunk).compute()
print(f"Dask read time (no chunks): {time.time() - start}")

start = time.time()
t = tifffile.imread(path_tiff)
print(f"Tiff read time: {time.time() - start}")

Results:

# 1 thread:
Zarr read time (chunked): 2.8317720890045166
Zarr read time (no chunks): 0.7048866748809814
Dask read time (chunked): 3.4919939041137695
Dask read time (no chunks): 0.7005000114440918
Tiff read time: 1.094351053237915
# 10 threads:
Zarr read time (chunked): 2.8606531620025635
Zarr read time (no chunks): 0.32688140869140625
Dask read time (chunked): 2.7447142601013184
Dask read time (no chunks): 0.712876558303833
Tiff read time: 0.4734377861022949
# 50 threads:
Zarr read time (chunked): 2.8490779399871826
Zarr read time (no chunks): 0.2691495418548584
Dask read time (chunked): 2.9153594970703125
Dask read time (no chunks): 0.716036319732666
Tiff read time: 0.4784407615661621

Additional output

No response

@FelipeMoser FelipeMoser added the bug Potential issues with the zarr-python library label Aug 13, 2024
@jhamman jhamman added the V2 Affects the v2 branch label Sep 13, 2024
@zarr-developers zarr-developers locked and limited conversation to collaborators Sep 13, 2024
@jhamman jhamman converted this issue into discussion #2184 Sep 13, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug Potential issues with the zarr-python library V2 Affects the v2 branch
Projects
None yet
Development

No branches or pull requests

2 participants