Skip to content

Conversation

@maxrjones
Copy link
Member

@maxrjones maxrjones commented Nov 14, 2025

This PR adds open_virtual_datatree for virtualizing higher-level hierarchical data structures.

Builds on #837

TODO:

  • Unit tests
  • Test with other backends (e.g., HDF)

Checklist:

  • Closes Support for groups #84
  • Tests added
  • Tests passing
  • Full type hint coverage
  • Changes are documented in docs/releases.rst
  • New functions/methods are listed in api.rst
  • New functionality has documentation

commit 113c712
Author: Ilan Gold <[email protected]>
Date:   Tue Nov 11 17:25:49 2025 +0100

    Update virtualizarr/tests/test_parsers/test_hdf/test_hdf.py

    Co-authored-by: Chuck Daniels <[email protected]>

commit 5693649
Author: Ilan Gold <[email protected]>
Date:   Tue Nov 11 17:25:10 2025 +0100

    refactor: `get_deepest_group_or_array` type

commit 849d567
Merge: 916c3e1 7a13261
Author: Ilan Gold <[email protected]>
Date:   Fri Nov 7 14:18:51 2025 +0100

    Merge branch 'main' into ig/nested_h5_group

commit 916c3e1
Merge: d8dc369 a2b65c1
Author: Ilan Gold <[email protected]>
Date:   Wed Sep 10 13:11:01 2025 +0200

    Merge branch 'main' into ig/nested_h5_group

commit d8dc369
Author: Ilan Gold <[email protected]>
Date:   Sun Sep 7 19:53:08 2025 +0200

    fix: update release note

commit a3a2615
Author: ilan-gold <[email protected]>
Date:   Sun Sep 7 16:45:22 2025 +0200

    chore: clean up error handling for non-metadata group requests + tests

commit fc0e999
Merge: 3036de8 1facb10
Author: ilan-gold <[email protected]>
Date:   Sun Sep 7 16:29:59 2025 +0200

    Merge branch 'ig/nested_h5_group' of https://github.com/ilan-gold/VirtualiZarr into ig/nested_h5_group

commit 3036de8
Author: ilan-gold <[email protected]>
Date:   Sun Sep 7 16:29:44 2025 +0200

    fix: remove subgroup check

commit 1facb10
Author: Ilan Gold <[email protected]>
Date:   Sun Sep 7 16:17:43 2025 +0200

    fix: wrong place!

commit 33e517e
Merge: afcaaed c1db925
Author: Ilan Gold <[email protected]>
Date:   Sun Sep 7 16:17:21 2025 +0200

    Merge branch 'main' into ig/nested_h5_group

commit afcaaed
Author: ilan-gold <[email protected]>
Date:   Sun Sep 7 16:15:26 2025 +0200

    chore: consolidate tests

commit 253d184
Author: ilan-gold <[email protected]>
Date:   Sun Sep 7 15:57:18 2025 +0200

    fix: disallow nested xarray handling

commit 57551b8
Author: ilan-gold <[email protected]>
Date:   Fri Aug 29 08:52:28 2025 +0200

    chore: relnote

commit 6fe1dbd
Author: ilan-gold <[email protected]>
Date:   Fri Aug 29 08:45:57 2025 +0200

    fix: typing

commit 85f2930
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Thu Aug 28 16:27:24 2025 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit e92ba8c
Author: ilan-gold <[email protected]>
Date:   Thu Aug 28 18:19:46 2025 +0200

    fix: simplify node logic

commit 323be33
Author: ilan-gold <[email protected]>
Date:   Thu Aug 28 18:02:04 2025 +0200

    feat: nested groups
commit 128138b
Author: Max Jones <[email protected]>
Date:   Thu Nov 13 18:39:24 2025 -0500

    Apply suggestion from code review

commit 9c0e803
Author: Max Jones <[email protected]>
Date:   Thu Nov 13 18:25:39 2025 -0500

    Apply suggestions from code review

    Co-authored-by: Chuck Daniels <[email protected]>

commit d1595b4
Author: Max Jones <[email protected]>
Date:   Thu Nov 13 18:25:10 2025 -0500

    Update virtualizarr/manifests/store.py

    Co-authored-by: Chuck Daniels <[email protected]>

commit 1573992
Author: Max Jones <[email protected]>
Date:   Thu Nov 13 18:17:32 2025 -0500

    Add test for listing array in subgroup

commit dcf80bb
Author: Max Jones <[email protected]>
Date:   Thu Nov 13 15:25:10 2025 -0500

    Fix typing

commit 2448c5d
Author: Max Jones <[email protected]>
Date:   Thu Nov 13 15:08:33 2025 -0500

    Improve ManifestStore.list_dir for arrays and nested groups
@maxrjones maxrjones self-assigned this Nov 14, 2025
@codecov
Copy link

codecov bot commented Nov 14, 2025

Codecov Report

❌ Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.17%. Comparing base (c67dcc1) to head (9b91ad3).

Files with missing lines Patch % Lines
virtualizarr/manifests/group.py 91.66% 1 Missing ⚠️
virtualizarr/xarray.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #838      +/-   ##
==========================================
- Coverage   89.22%   89.17%   -0.06%     
==========================================
  Files          34       34              
  Lines        1987     2013      +26     
==========================================
+ Hits         1773     1795      +22     
- Misses        214      218       +4     
Files with missing lines Coverage Δ
virtualizarr/__init__.py 75.00% <100.00%> (ø)
virtualizarr/manifests/store.py 89.20% <100.00%> (+0.23%) ⬆️
virtualizarr/manifests/group.py 93.10% <91.66%> (-0.38%) ⬇️
virtualizarr/xarray.py 85.96% <91.66%> (+0.52%) ⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maxrjones maxrjones marked this pull request as ready for review January 7, 2026 20:39
@maxrjones maxrjones requested a review from a team January 7, 2026 21:43
Copy link
Collaborator

@chuckwondo chuckwondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxrjones, this looks great. Approved!

Feel free to ignore any or all of my suggested changes, which are only suggestions for some simplification, which are certainly arguable, and thus may not appear simpler to others. I leave them all to your discretion.

Co-authored-by: Chuck Daniels <[email protected]>
Comment on lines 74 to 77
drop_variables
Variables in the data source to drop before returning.
loadable_variables
Variables in the data source to load as Dask/NumPy arrays instead of as virtual arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems under-specified. Can I load/drop specific variables from one group? If so how? Does it understand a path-like string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more details and examples in 9b91ad3 (#838)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should ask, do you approve of the behavior that's now described regarding loadable/droppable variables @TomNicholas? we should avoid breaking changes obviously, so this is an important component to get right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the current map-over-groups behaviour, because IMO it only makes sense for overviews-style-datatrees. Accepting path-like strings is more effort but should work. Unfortunately this whole API design problem is just a little bit messy 😞

I think maybe this is an argument for implementing something like vdt["var"].load()...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so what if we only implement the default behavior in this PR and punt on adding loadable/drop_variables? It's the end of the project increment for me, so I'd like to get something in before I get pulled onto other work and splitting PRs is good practice anyways.

We'd need to decide on the default behavior between:

  1. Same as xarray (load coordinates)
  2. Don't load anything

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with that, because then generalizing later wouldn't be a breaking change.

We'd need to decide on the default behavior

Maybe you could keep both these options by keeping the kwargs, but raising a NotImplementedError if the user attempts to pass an a non-empty list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants