Skip to content

open_datatree(group='some_subgroup') returning parent nodes #9665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eni-awowale opened this issue Oct 23, 2024 · 5 comments · Fixed by #9666
Closed

open_datatree(group='some_subgroup') returning parent nodes #9665

eni-awowale opened this issue Oct 23, 2024 · 5 comments · Fixed by #9666
Labels
bug topic-backends topic-DataTree Related to the implementation of a DataTree class

Comments

@eni-awowale
Copy link
Collaborator

What is your issue?

@aladinor Noticed this during a demo a few meetings back but I don't think we followed up on this.

If you have a DataTree of this shape.

<xarray.DataTree>
Group: /
│   Dimensions:        (lat: 1, lon: 2)
│   Dimensions without coordinates: lat, lon
│   Data variables:
│       root_variable  (lat, lon) float64 16B ...
└── Group: /Group1
    │   Dimensions:      (lat: 1, lon: 2)
    │   Dimensions without coordinates: lat, lon
    │   Data variables:
    │       group_1_var  (lat, lon) float64 16B ...
    └── Group: /Group1/subgroup1
            Dimensions:        (lat: 1, lon: 2)
            Dimensions without coordinates: lat, lon
            Data variables:
                subgroup1_var  (lat, lon) float64 16B ...

And you specify a path with group= you still get a nested tree but with empty groups for the groups that were not specified.

In  [1]: open_datatree('filename.nc', engine='netcdf4', group='/Group1/subgroup')
Out [1]: 
<xarray.DataTree>
Group: /
└── Group: /Group1
    └── Group: /Group1/subgroup1
            Dimensions:        (lat: 1, lon: 2)
            Dimensions without coordinates: lat, lon
            Data variables:
                subgroup1_var  (lat, lon) float64 16B ...

I thought the expected result would be to only return the group specified with all of it's child nodes (if it has any), something like:

<xarray.DataTree>
Group: /Group1/subgroup1
            Dimensions:        (lat: 1, lon: 2)
            Dimensions without coordinates: lat, lon
            Data variables:
                subgroup1_var  (lat, lon) float64 16B ...

CCing the usual squad @shoyer, @keewis, @TomNicholas, @owenlittlejohns, and @flamingbear

@eni-awowale eni-awowale added the topic-DataTree Related to the implementation of a DataTree class label Oct 23, 2024
@eni-awowale eni-awowale changed the title open_datatree(group='some_subgroup') not returning parent nodes open_datatree(group='some_subgroup') returning parent nodes Oct 23, 2024
@TomNicholas
Copy link
Member

TomNicholas commented Oct 23, 2024

Yes, good catch, we should fix that. I think the returned result has to be

<xarray.DataTree>
Group: /subgroup1
    Dimensions:        (lat: 1, lon: 2)
    Dimensions without coordinates: lat, lon
    Data variables:
        subgroup1_var  (lat, lon) float64 16B ...

because you can't have a group name containing slashes.

The simplest way to to fix this would be to prune the groups_dict returned by open_groups_as_dict before giving it to DataTree.from_dict (or returning it like open_groups does, because this issue likely applies there too).

The proper way to fix it would be to fix the behaviour of open_groups_as_dict.

FYI the conclusion of the discussion today was that any coordinates defined above subgroup1 should be ignored by default.

Are you up for taking this one on @eni-awowale ?

@aladinor
Copy link
Contributor

Thanks, @eni-awowale, for bringing this up. I am working on it, and I think it will be resolved soon.

@keewis
Copy link
Collaborator

keewis commented Oct 23, 2024

I think what we talked about yesterday was to make subgroup1 the root of the returned DataTree object, and then we can attach a source_group encoding (or something similar) in case we want to look up where the tree came from.

@TomNicholas
Copy link
Member

make subgroup1 the root of the returned DataTree object

Yep that's what I was trying to say above.

attach a source_group

Oh yes, good point.

I am working on it

Do you want to post up your PR @aladinor (even if it doesn't work yet)? Then we can help get it in asap.

@aladinor
Copy link
Contributor

Sure! I will do it ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-backends topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants