-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
NetCDF valid_min
/_max
/_range
do not mask datasets and do not get scaled
#8359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
@claytharrison Sorry for the massive delay here. This slipped somehow through the cracks. Thanks for the detailed problem description. For xarray I currently see only three solutions/workarounds handling these types of packed data:
See this section of the CF Conventions for details: Missing data, valid and actual range of data. Simplest solution, but less user friendly is 1. Solution 2. is too much involved and error prone. Solution 3. would be less invasive and most user friendly. There might be other solutions which I do not have on the list right now. I'd favour solution 3. which is conforming to standard, user friendly and relatively easy to handle in the encoding step. |
I was just (regrettably) surprised by My purpose in commenting is to support OPs proposal that 1: Whether |
What is your issue?
When reading a netCDF dataset with
decode_cf
andmask_and_scale
set toTrue
, Xarray uses thescale_factor
and_FillValue
/missing_value
attributes of each variable in the dataset to apply the proper masking and scaling. However, from what I can tell, it does not handle certain other common attributes when masking, in particular:valid_max
,valid_min
, andvalid_range
. I can't find any direct statement of this behavior in the Xarray documentation or by searching this repository, but I encountered the behavior myself and found a mention in the documentation for the xcube package (this relates to zarr rather than netCDF but is the only mention I could find).It is nontrivial to handle this as a user, because you (rightfully) lose the
scale_factor
attribute on read whenmask_and_scale
is true. Sincevalid_min
/_max
/_range
are stored in the same domain as the packed data if conventions are followed (i.e. unscaled if there is ascale_factor
), it becomes complicated to use them for masking after the fact.I can only find one discussion (#822) on whether these attributes should or should not be handled by Xarray. In that thread, it was brought up that 1) netCDF4-python doesn't handle this on their end, 2) this doesn't really matter from a technical standpoint anyway because Xarray uses its own logic for scaling, and 3) apparently, they are not directly part of the CF conventions, but rather the NUG convention.
However, netCDF4-python does mask values outside
valid_min
/_max
/_range
when opening a dataset (Unidata/netcdf4-python#670), so I feel it would be natural to do the same in Xarray, at least whendecode_cf
andmask_and_scale
are bothTrue
. Additionally, according to the netCDF attribute conventions, "generic applications should treat values outside the valid range as missing". I'm not sure any of this was the case back in 2016 when this was last discussed.I propose that
mask_and_scale
should (optionally?) mask values which are invalid according to these attributes. If there are reasons not to, then perhaps, at least,valid_min
/_max
/_range
could be transformed byscale_factor
andadd_offset
when scaling is applied to the rest of the dataset, so that users can easily create the relevant masks themselves.The text was updated successfully, but these errors were encountered: