-
Notifications
You must be signed in to change notification settings - Fork 2
docs: zarr config document #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
136042b
503eebf
f697649
688c372
4f561b2
ca9af11
ce93e76
b3bd482
a7dc362
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # Zarr Configuration | ||
|
|
||
| If you are using a local file system, use {doc}`zarrs-python <zarrs:index>`: | ||
|
|
||
| ```python | ||
| import zarr | ||
| zarr.config.set({"codec_pipeline.path": "zarrs.ZarrsCodecPipeline"}) | ||
| ``` | ||
|
|
||
| Otherwise normal use {mod}`zarr` without {doc}`zarrs-python <zarrs:index>` (wich does not support, for example, remote stores). | ||
|
|
||
| ## `zarrs` Performance | ||
|
|
||
| Please look at {doc}`zarrs-python <zarrs:index>`'s docs for more info but there are two important setting to consider: | ||
|
|
||
| ```python | ||
| zarr.config.set({ | ||
| "threading.max_workers": None, | ||
| "codec_pipeline": { | ||
| "direct_io": False | ||
| } | ||
| }) | ||
| ``` | ||
|
|
||
| The `threading.max_workers` will control how many threads are used by `zarrs`, and by extension, our data loader. | ||
| This parameter is global and controls both the rust parallelism and the Python parallelism. | ||
| If you notice thrashing or similar oversubscription behavior of threads, please open an issue. | ||
|
|
||
| Some **linux** file systems' [performance may suffer][] from the high level of parallelism combined with a full page cache in RAM. | ||
| To bypass the page cache, use `direct_io` - there should not be a performance difference. | ||
| If this setting is set on a system that does not support `direct_io`, file reading will fall back to normal buffered io. | ||
|
|
||
| ## `zarr-python` performance | ||
|
|
||
| In this case, likely the store of interest is in the cloud. | ||
| Please see zarr python's {doc}`zarr:user-guide/config` for more info but likely of most interest aside from the above mentioned `threading.max_workers` is | ||
|
Comment on lines
+35
to
+36
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we should quickly check the performance without zarrs. Last time I checked, you need a lot bigger chunk sizes without zarrs. This will probably the case as well if you work with a store in the cloud (way higher latency - so bigger package size could be beneficial). Just as some guidelines to the user.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you want to proceed here? Create new benchmarks? Can we make an issue for this? I don't think this guide is making any recommendations, it's more just so users have the information |
||
|
|
||
| ```python | ||
| zarr.config.set({"async.concurrency": 64}) | ||
| ``` | ||
|
|
||
| which is 64 by default. | ||
| See the [zarr page on concurrency][] for more information. | ||
|
|
||
| [performance may suffer]: https://gist.github.com/ilan-gold/705bd36329b0e19542286385b09b421b | ||
| [zarr page on concurrency]: https://zarr.readthedocs.io/en/latest/user-guide/consolidated_metadata/#synchronization-and-concurrency | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run a quick benchmark to which extend this affects performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a
gistthat highlights the issue, I don't think we can really do much else than make people aware of the problem. Like I saiddirect_ioshould not harm performance