Skip to content

Rethinking Zarr's core dependencies #2391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jhamman opened this issue Oct 16, 2024 · 8 comments · Fixed by #2534
Closed

Rethinking Zarr's core dependencies #2391

jhamman opened this issue Oct 16, 2024 · 8 comments · Fixed by #2534

Comments

@jhamman
Copy link
Member

jhamman commented Oct 16, 2024

I'd like to open the conversation about what Zarr's core dependencies are for 3.0. Currently, this looks like:

dependencies = [
'asciitree',
'numpy>=1.25',
'fasteners',
'numcodecs>=0.10.2',
'fsspec>2024',
'crc32c',
'typing_extensions',
'donfig',
]

Some of these are not used anymore (asciitree and fasteners) so those can safely go.

Then there is fsspec and crc32c. These are only needed for the RemoteStore and ShardingCodec, respectively. What do we think about making these optional?

One proposed diff in our dependencies would look something like:

 dependencies = [
-    'asciitree',
     'numpy>=1.25',
-    'fasteners',
-    'numcodecs>=0.10.2',
-    'fsspec>2024',
-    'crc32c',
+    'numcodecs>=0.12',
     'typing_extensions',
     'donfig',
 ]

 [project.optional-dependencies]
+remote = [
+    "fsspec",
+]
+sharding = [
+    "crc32c",
+]

Notes:

  • fsspec is pure python with no dependencies so is not a particular heavy dependency.
  • crc32c could potentially move into numcodecs, right?
@d-v-b
Copy link
Contributor

d-v-b commented Oct 17, 2024

👍 this seems good to me.

@jhamman jhamman mentioned this issue Oct 17, 2024
6 tasks
@dstansby
Copy link
Contributor

dstansby commented Oct 17, 2024

I think sharding is a big enough part of what zarr v3 promises, that it's worth having crc32c as part of the default dependencies. Looking at their files on PyPI the package is very light (~40kB), and it doesn't have any other requirements.

fsspec is also small (200kB), so I wonder if it's worth keeping default too so users don't have to jump through extra hoops to open remote arrays? Given a large use case of zarr is a format for large data > a lot of the time users will be accessing it remotely.

What are the reasons for removing these? Definitely open to considering it, but given they're lightweight deps at the moment I'm thinking we should keep them as default.

@d-v-b
Copy link
Contributor

d-v-b commented Oct 17, 2024

I think sharding is a big enough part of what zarr v3 promises, that it's worth having crc32c as part of the default dependencies. Looking at their files on PyPI the package is very light (~40kB), and it doesn't have any other requirements.

Is there a reason why we shouldn't put sharding in numcodecs? then the crc32c dependency would live there.

@dstansby
Copy link
Contributor

👍 for that

@jhamman
Copy link
Member Author

jhamman commented Oct 17, 2024

Here's my thought on fsspec. While I agree that the package dependency is not particularly large, it also don't come with batteries included -- you still need s3fs, gcfs, adlfs, etc to use the RemoteStore. I imagine we're all aligned on making keeping each of the individual implementations out of the required dependency tree. I guess my perspective is that if all of those are optional, and they all depend on fsspec, then we don't gain much by requiring fsspec.

@d-v-b and/or @dstansby - can one of you open an issue on crc32c in numcodecs?

@dstansby
Copy link
Contributor

That makes sense to me on fsspec - would be good to add some docs if it's optional, I'll stick a request on #2395.

I opened an issue for cr32c at zarr-developers/numcodecs#610

@normanrz
Copy link
Member

I also think that we should only drop crc32c as a core zarr dependency once it is part of numcodecs. It would suck if people had to install additional groups to be able to use sharding.

@oanegros
Copy link

This also would fix #1370 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants