Skip to content

Conversation

@ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Oct 27, 2025

Given that a lot of people are using the private Array._async_array, I think it should be documented and put in a public namespace. This PR specifically does not make zarr.core.sync.sync public because I'm not 100% sure it's necessary in the day-and-age of obstore but also because I can definitely see arguments why a better API might be useful, certain one that does:

some_batching_api(my_sync_array.async_array.getitem(...), my_other_sync_array.async_array.getitem(...))

instead of

zarr.core.sync(asyncio.gather(my_sync_array.async_array.getitem(...), my_other_sync_array.async_array.getitem(...)))

Another reason I think making this public is because you may want to batch together these low level requests just because of the way people's APIs are structured rather than forcing everything to go through something like

some_batching_api(sync_arrays, selections)

Definitely open to opinions on this! Marked as draft for now!

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Oct 27, 2025
@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

❌ Patch coverage is 91.11111% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.86%. Comparing base (d0c3b7f) to head (bf3b868).

Files with missing lines Patch % Lines
src/zarr/core/array.py 91.11% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3556      +/-   ##
==========================================
- Coverage   61.87%   61.86%   -0.01%     
==========================================
  Files          85       85              
  Lines       10134    10137       +3     
==========================================
+ Hits         6270     6271       +1     
- Misses       3864     3866       +2     
Files with missing lines Coverage Δ
src/zarr/core/array.py 68.51% <91.11%> (-0.12%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TomAugspurger
Copy link
Contributor

I'm fine with this, as long as we're OK making the implementation detail that Array is backend by an AsyncArray part of the public API.

I would also pretty strongly encourage people using ._async_array to consider whether they can use AsyncArray directly. Maybe they can't (because they in turn are offering a sync API).

@ilan-gold
Copy link
Contributor Author

I'm fine with this, as long as we're OK making the implementation detail that Array is backend by an AsyncArray part of the public API.

Another option would be to create a to_async function that could in theory be future proofed (although the property does the same thing) in the sense that we can remove the "is backed by AsyncArray" aspect of the implementation while maintaining API compat by implementing a "reopen" scheme under the async_array property/to_async function (however it is implemented).

I would also pretty strongly encourage people using ._async_array to consider whether they can use AsyncArray directly. Maybe they can't (because they in turn are offering a sync API).

Agreed! I think the "offering a sync API" thing is the core of it, though.

@ilan-gold ilan-gold marked this pull request as ready for review November 10, 2025 18:06
@ilan-gold ilan-gold marked this pull request as draft November 10, 2025 18:06
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Nov 10, 2025
@ilan-gold ilan-gold marked this pull request as ready for review November 10, 2025 18:21
@ilan-gold
Copy link
Contributor Author

I took @TomAugspurger 's comment quite seriously and updated the doc string for the proposed async_array property. I think the promise should be, independent of the current implementation, a way to get an asynchronous "version" of a synchronous array synchronously + simply i.e., for library devs who want this async-behind-sync API like zarr has. In theory, then, if we ever decouple the two classes (AsyncArray and Array), we would still hold up our promise i.e., you get an async version of the synchronous array, but not the exact one "backing" the synchronous array.

As a counter example, highlighting that something like this is not really possible at the moment, API-wise, the __init__ method on AsyncArray (i.e., a synchronous method) takes three arguments and then sets up its own codec pipeline. But one of those arguments, config, is private on the AsyncArray class (setting aside the fact that _async_array is private) i.,e.,:

AsyncArray(my_sync_array.metadata, my_sync_array._async_array.store_path, my_sync_array._async_array._config)

So maybe an alternative would be a synchronous way to create an AsyncArray easily but I like the idea more of wrapping this in the Array class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants