Skip to content

Cannot Save NetCDF: Conflicting _FillValue and Missing_Value #7191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillyChap opened this issue Oct 20, 2022 · 8 comments
Closed

Cannot Save NetCDF: Conflicting _FillValue and Missing_Value #7191

WillyChap opened this issue Oct 20, 2022 · 8 comments

Comments

@WillyChap
Copy link

What is your issue?

This seems to be an issue only with netcdf files that I have first opened altered and then saved with xarray. Also, this could be related to: #997 but seems to have different bug characteristics. However, I am unable to save to netcdf due to apparent conflicts in the masking attribute variables, which don't exist in the file.

xarray package version:

print(xr.__version__) ##2022.10.0
print(dask.__version__) ##2022.03.0

I am unable to save to netcdf due to error:

ValueError: Variable 'uwnd' has conflicting _FillValue (nan) and missing_value (-9.969209968386869e+36). Cannot encode data.

Though variable uwnd has neither attrs "_FillValue" or "missing_value".

DS_cera0_full.to_netcdf('/Users/wchapman/Downloads/uvwndNOAA/tester.nc')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [35], line 1
----> 1 DS_cera0_full.to_netcdf('/Users/wchapman/Downloads/uvwndNOAA/tester.nc')

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/core/dataset.py:1899, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   1896     encoding = {}
   1897 from ..backends.api import to_netcdf
-> 1899 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   1900     self,
   1901     path,
   1902     mode=mode,
   1903     format=format,
   1904     group=group,
   1905     engine=engine,
   1906     encoding=encoding,
   1907     unlimited_dims=unlimited_dims,
   1908     compute=compute,
   1909     multifile=False,
   1910     invalid_netcdf=invalid_netcdf,
   1911 )

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/backends/api.py:1230, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1225 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1226 # to avoid this mess of conditionals
   1227 try:
   1228     # TODO: allow this work (setting up the file for writing array data)
   1229     # to be parallelized with dask
-> 1230     dump_to_store(
   1231         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1232     )
   1233     if autoclose:
   1234         store.close()

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/backends/api.py:1277, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1274 if encoder:
   1275     variables, attrs = encoder(variables, attrs)
-> 1277 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/backends/common.py:266, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    263 if writer is None:
    264     writer = ArrayWriter()
--> 266 variables, attributes = self.encode(variables, attributes)
    268 self.set_attributes(attributes)
    269 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/backends/common.py:355, in WritableCFDataStore.encode(self, variables, attributes)
    352 def encode(self, variables, attributes):
    353     # All NetCDF files get CF encoded by default, without this attempting
    354     # to write times, for example, would fail.
--> 355     variables, attributes = cf_encoder(variables, attributes)
    356     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    357     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/conventions.py:868, in cf_encoder(variables, attributes)
    865 # add encoding for time bounds variables if present.
    866 _update_bounds_encoding(variables)
--> 868 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    870 # Remove attrs from bounds variables (issue #2921)
    871 for var in new_vars.values():

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/conventions.py:868, in <dictcomp>(.0)
    865 # add encoding for time bounds variables if present.
    866 _update_bounds_encoding(variables)
--> 868 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    870 # Remove attrs from bounds variables (issue #2921)
    871 for var in new_vars.values():

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/conventions.py:273, in encode_cf_variable(var, needs_copy, name)
    264 ensure_not_multiindex(var, name=name)
    266 for coder in [
    267     times.CFDatetimeCoder(),
    268     times.CFTimedeltaCoder(),
   (...)
    271     variables.UnsignedIntegerCoder(),
    272 ]:
--> 273     var = coder.encode(var, name=name)
    275 # TODO(shoyer): convert all of these to use coders, too:
    276 var = maybe_encode_nonstring_dtype(var, name=name)

File ~/opt/miniconda3/envs/windypharm/lib/python3.10/site-packages/xarray/coding/variables.py:161, in CFMaskCoder.encode(self, variable, name)
    154 mv = encoding.get("missing_value")
    156 if (
    157     fv is not None
    158     and mv is not None
    159     and not duck_array_ops.allclose_or_equiv(fv, mv)
    160 ):
--> 161     raise ValueError(
    162         f"Variable {name!r} has conflicting _FillValue ({fv}) and missing_value ({mv}). Cannot encode data."
    163     )
    165 if fv is not None:
    166     # Ensure _FillValue is cast to same dtype as data's
    167     encoding["_FillValue"] = dtype.type(fv)

ValueError: Variable 'uwnd' has conflicting _FillValue (nan) and missing_value (-9.969209968386869e+36). Cannot encode data.

Manually setting those variables does not remove the error.

DS_cera0_full.uwnd.attrs['_FillValue']=np.nan
DS_cera0_full.uwnd.attrs['missing_value']=np.nan
DS_cera0_full.to_netcdf('/Users/wchapman/Downloads/uvwndNOAA/tester.nc')

ValueError: Variable 'uwnd' has conflicting _FillValue (nan) and missing_value (-9.969209968386869e+36). Cannot encode data.

h5netcdf engine shows the same behavior.

DS_cera0_full.to_netcdf('/Users/wchapman/Downloads/uvwndNOAA/tester.nc',engine='h5netcdf')
ValueError: Variable 'uwnd' has conflicting _FillValue (nan) and missing_value (-9.969209968386869e+36). Cannot encode data.
@WillyChap WillyChap added the needs triage Issue that has not been reviewed by xarray team member label Oct 20, 2022
@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Oct 20, 2022

@WillyChap Could you inspect the contents of .encoding. IIRC that values are located there. Or you would need to specify the encoding kwarg in to_netcdf.

Disclaimer: This is from the top of my head, so mistakes are likely.

@dopplershift
Copy link
Contributor

I'm also not sure why _FillValue and missing_value should be required to have the same value.

@WillyChap
Copy link
Author

@kmuehlbauer it looks like encoding is empty.

print(DS_cera0.encoding)
# {}

@kmuehlbauer
Copy link
Contributor

And on the variable? Please also have a look at the docs on to_netcdf regarding encoding-kwarg.

@kmuehlbauer
Copy link
Contributor

@dopplershift equivalence is checked here, there is also the error origination from.

fv = encoding.get("_FillValue")
mv = encoding.get("missing_value")
if (
fv is not None
and mv is not None
and not duck_array_ops.allclose_or_equiv(fv, mv)
):
raise ValueError(
f"Variable {name!r} has conflicting _FillValue ({fv}) and missing_value ({mv}). Cannot encode data."
)

@WillyChap
Copy link
Author

@kmuehlbauer
So it looks like uwnd.encoding does contain _FillVale and missing_value which are contrasting. Setting them to equal values enables a save.

However, these variables were set to contrasting values after an xarray subset or save. See workflow below. It appears that to_netcdf will assign '_FillValue': nan. I don't know if this is behavior that requires a change, or if it is just a quirk that folks will have to work through.

fn_E20c_VP=sorted(glob.glob('/Users/wchapman/Downloads/uvwndNOAA/uwnd*.nc'))
DSorig = xr.open_dataset(fn_E20c_VP[0])
DSorig.uwnd.encoding
{'zlib': True,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': True,
 'complevel': 2,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 1, 73, 144),
 'least_significant_digit': 1,
 'source': '/Users/wchapman/Downloads/uvwndNOAA/uwnd.1950.nc',
 'original_shape': (365, 17, 73, 144),
 'dtype': dtype('float32'),
 'missing_value': -9.96921e+36}

Subset and Save:

fn_E20c_VP=sorted(glob.glob('/Users/wchapman/Downloads/uvwndNOAA/uwnd*.nc'))
DS_subset = DSorig.sel(level=300)
DS_subset.to_netcdf('/Users/wchapman/Downloads/test_behavior.nc')
DS_subset.uwnd.encoding
{'zlib': True,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': True,
 'complevel': 2,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 1, 73, 144),
 'least_significant_digit': 1,
 'source': '/Users/wchapman/Downloads/uvwndNOAA/uwnd.1950.nc',
 'original_shape': (365, 17, 73, 144),
 'dtype': dtype('float32'),
 'missing_value': -9.96921e+36}

Re-open newly saved file and inspect :

DS_subset_aftersave = xr.open_dataset('/Users/wchapman/Downloads/test_behavior.nc')
DS_subset_aftersave.uwnd.encoding
{'zlib': True,
 'szip': False,
 'zstd': False,
 'bzip2': False,
 'blosc': False,
 'shuffle': True,
 'complevel': 2,
 'fletcher32': False,
 'contiguous': False,
 'chunksizes': (1, 73, 144),
 'least_significant_digit': 1,
 'source': '/Users/wchapman/Downloads/test_behavior.nc',
 'original_shape': (365, 73, 144),
 'dtype': dtype('float32'),
 'missing_value': -9.96921e+36,
 '_FillValue': nan,
 'coordinates': 'level'}

@dcherian
Copy link
Contributor

dcherian commented Oct 21, 2022

I'm also not sure why _FillValue and missing_value should be required to have the same value.

Xarray has only one way to represent both concepts: np.nan. So when you decode you lose information on whether a value was a missing_value or _FillValue. Then when you encode, you have to pick either _FillValue or missing_value to represent the np.nan

to_netcdf will assign '_FillValue': nan.

This seems like a bug. Can you open a new issue with a minimum reproducible example that uses random data please?

@dcherian dcherian added usage question and removed needs triage Issue that has not been reviewed by xarray team member labels Oct 22, 2022
@jhamman
Copy link
Member

jhamman commented Sep 12, 2023

Closing in favor of the more current #7722.

@jhamman jhamman closed this as completed Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants