Skip to content

NetCDF attributes like long_name and units lost on .mean() #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
j08lue opened this issue Jun 24, 2015 · 5 comments
Closed

NetCDF attributes like long_name and units lost on .mean() #442

j08lue opened this issue Jun 24, 2015 · 5 comments
Labels
API design topic-CF conventions topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)

Comments

@j08lue
Copy link
Contributor

j08lue commented Jun 24, 2015

When reading in a variable from netCDF, the standard attributes like long_name, standard_name, and units are being propagated, but apparently lost when calling .load() .mean() on the DataArray.

Couldn't these CF-Highly Recommended Variable Attributes be kept during this operation?

(What to do with them afterwards, e.g. upon merge, is a different question, unresolved also in the pandas community.)

EDIT: the problem actually occurs when calling .mean() (not .load(), as originally posted).

@shoyer
Copy link
Member

shoyer commented Jun 24, 2015

Hmm. This is definitely a bug -- load should preserve all metadata.

@shoyer
Copy link
Member

shoyer commented Jun 24, 2015

Could you post an example dataset/code for which this occurs? I'm struggling to reproduce this.

@j08lue j08lue changed the title NetCDF attributes like long_name and units lost on load() NetCDF attributes like long_name and units lost on ~~.lost()~~ .mean() Jun 25, 2015
@j08lue j08lue changed the title NetCDF attributes like long_name and units lost on ~~.lost()~~ .mean() NetCDF attributes like long_name and units lost on .mean() Jun 25, 2015
@j08lue
Copy link
Contributor Author

j08lue commented Jun 25, 2015

Sorry for the confusion! The loss of attributes actually occurs when applying .mean() (rather than .load()).

See this notebook (same in nbviewer) for an example with some opendap-hosted data.

@shoyer
Copy link
Member

shoyer commented Jun 25, 2015

Ah. So this is intentional. There is an optional parameter that lets you control this -- try .mean(keep_attrs=True).

The basic problem is that it's ambiguous how to handle attributes like units after doing computation. I don't want to inspect attributes and choose some to preserve and others to remove, so we have a choice of either preserving all attributes in an operation or removing all of them.

Obviously, for some aggregations (e.g., sum or var) it doesn't make sense to preserve attributes (which commonly include units). I suppose we could make an exception for aggregations like mean/median/std, but it's also weird to have some aggregations that preserve attributes and others that don't.

@j08lue
Copy link
Contributor Author

j08lue commented Jun 26, 2015

That makes sense. Great that there is an option to keep_attrs. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design topic-CF conventions topic-metadata Relating to the handling of metadata (i.e. attrs and encoding)
Projects
None yet
Development

No branches or pull requests

3 participants