-
-
Notifications
You must be signed in to change notification settings - Fork 329
Structured Arrays #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Zarr does support storing structured arrays, e.g.:
However, one nice feature that h5py has which zarr doesn't currently have is the ability to load a specific field or fields, e.g., FWIW regarding the choice of single array with structured dtype versus one array with simple dtype per column, I've generally found the latter (i.e., columnar) storage to be more flexible and more efficient for a variety of uses. However I still do use structured arrays too and so would be interested to support both patterns. |
Oh, that's good to know. Then maybe I'm running into a bug or an unsupported edge case. ( https://github.com/alimanfoo/zarr/issues/111 ) Will close this out.
Indeed, that would be a very nice feature for Zarr. Used this with h5py some myself. Edit: Opened issue ( https://github.com/alimanfoo/zarr/issues/112 ) on this this point.
By columnar, I'm assuming you mean having a single type for the array. In which case, I do agree with you. Still sometimes a structured array is just the right data structure for the problem. |
…#110) * feature(store): make list_* methods async generators * Update src/zarr/v3/store/memory.py * Apply suggestions from code review - simplify code comments - use `removeprefix` instead of `strip` --------- Co-authored-by: Davis Bennett <[email protected]>
* feat: functional .children method for groups * changes necessary for correctly generating list of children * add stand-alone test for group.children * give type hints a glow-up * test: use separate assert statements to avoid platform-dependent ordering issues * test: put fixtures in conftest, add MemoryStore fixture * docs: release notes * test: remove prematurely-added mock s3 fixture * chore: move v3 tests into v3 folder * chore: type hints * test: add schema for group method tests * chore: add type for zarr_formats * chore: remove localstore for now * test: add __init__.py to support imports from top-level conftest.py, and add some docstrings, and remove redundant def * fix: return valid JSON from GroupMetadata.to_bytes for v2 metadata * fix: don't use a type as a value * test: add getitem test * fix: replace reference to nonexistent method in with , which does exist * test: declare v3ness via directory structure, not test file name * add a docstring to _get, and pass auto_mkdir to _put * fix: add docstring to LocalStore.get_partial_values; adjust body of LocalStore.get_partial_values to properly handle the byte_range parameter of LocalStore.get. * test: add tests for localstore init, set, get, get_partial * fix: Rename children to members; AsyncGroup.members yields tuples of (name, AsyncArray / AsyncGroup) pairs; Group.members repackages these into a dict. * fix: make Group.members return a tuple of str, Array | Group pairs * fix: revert changes to synchronization code; this is churn that we need to deal with * chore: move v3 tests into v3 folder * chore: type hints * test: add schema for group method tests * chore: add type for zarr_formats * chore: remove localstore for now * test: add __init__.py to support imports from top-level conftest.py, and add some docstrings, and remove redundant def * fix: return valid JSON from GroupMetadata.to_bytes for v2 metadata * fix: don't use a type as a value * test: add getitem test * fix: replace reference to nonexistent method in with , which does exist * test: declare v3ness via directory structure, not test file name * add a docstring to _get, and pass auto_mkdir to _put * fix: add docstring to LocalStore.get_partial_values; adjust body of LocalStore.get_partial_values to properly handle the byte_range parameter of LocalStore.get. * test: add tests for localstore init, set, get, get_partial * fix: remove pre-emptive fetching from group.open * fix: use removeprefix (removes a substring) instead of strip (removes any member of a set); comment out / avoid tests that cannot pass right now; don't consider implicit groups for v2; check if prefix is present in storage before opening for Group.getitem * xfail v2 tests that are sure to fail; add delitem tests; partition xfailing tests into subtests * fix: handle byte_range[0] being None * fix: adjust test for localstore.get to check that get on nonexistent keys returns None; correctly create intermediate directories when preparing test data in test_local_store_get_partial * fix: add zarr_format parameter to array creation routines (which raises if zarr_format is not 3), and xfail the tests that will hit this condition. add tests for create_group, create_array, and update_attributes methods of asyncgroup. * test: add group init test * feature(store): make list_* methods async generators (#110) * feature(store): make list_* methods async generators * Update src/zarr/v3/store/memory.py * Apply suggestions from code review - simplify code comments - use `removeprefix` instead of `strip` --------- Co-authored-by: Davis Bennett <[email protected]> * fix: define utility for converting asyncarray to array, and similar for group, largely to appease mypy * chore: remove checks that only existed because of implicit groups * chore: clean up docstring and modernize some type hints * chore: move imports to top-level * remove fixture files * remove commented imports * remove explicit asyncio marks; use __eq__ method of LocalStore for test * rename test_storage to test_store * modern type hints --------- Co-authored-by: Joe Hamman <[email protected]>
Was interested in the possibility of storing structured arrays (a.k.a. record arrays or compound arrays) using Zarr. This is sort of related to PR ( https://github.com/alimanfoo/zarr/pull/84 ), but structured arrays are a simpler type. It also corresponds to a NumPy array type and a HDF5 dataset. So it might make sense to add similar support in Zarr. OTOH in both HDF5 and Zarr it is possible to construct a group that contains the individual arrays and at least with HDF5 this makes it easier to view using HDFView. Am opening this issue to discuss and weigh different options regarding the storage of record arrays using Zarr.
ref: https://docs.scipy.org/doc/numpy/user/basics.rec.html
ref: https://support.hdfgroup.org/HDF5/Tutor/compound.html
The text was updated successfully, but these errors were encountered: