Conflicting _FillValue and missing_value on write #7722

kmuehlbauer · 2023-04-05T11:56:46Z

What happened?

What did you expect to happen?

The file should be written on the second roundtrip.

There are at least two solutions to this:

Mask missing_value on read and purge missing_value completely in favor of _FillValue.
Do not handle missing_value at all, but let the user take action.

Minimal Complete Verifiable Example

import numpy as np
import netCDF4 as nc
import xarray as xr

with nc.Dataset("test-no-fillval-01.nc", mode="w") as ds:
    x = ds.createDimension("x", 4)
    test = ds.createVariable("test", "f4", ("x",), fill_value=None)
    test.missing_value = 1.
    test.valid_min = 2.
    test.valid_max = 10.
    test[:] = np.array([0.0, np.nan, 1.0, 8.0], dtype="f4")
with nc.Dataset("test-no-fillval-01.nc") as ds:
    print(ds["test"])
    print(ds["test"][:])


with xr.open_dataset("test-no-fillval-01.nc").load() as roundtrip:
    print(roundtrip)
    print(roundtrip["test"].attrs)
    print(roundtrip["test"].encoding)
    roundtrip.to_netcdf("test-no-fillval-02.nc")

with xr.open_dataset("test-no-fillval-02.nc").load() as roundtrip:
    print(roundtrip)
    print(roundtrip["test"].attrs)
    print(roundtrip["test"].encoding)
    roundtrip.to_netcdf("test-no-fillval-03.nc")

MVCE confirmation

Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

<class 'netCDF4._netCDF4.Variable'>
float32 test(x)
    missing_value: 1.0
    valid_min: 2.0
    valid_max: 10.0
unlimited dimensions: 
current shape = (4,)
filling on, default _FillValue of 9.969209968386869e+36 used

<xarray.Dataset>
Dimensions:  (x: 4)
Dimensions without coordinates: x
Data variables:
    test     (x) float32 0.0 nan nan 8.0
{'valid_min': 2.0, 'valid_max': 10.0}
{'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': True, 'chunksizes': None, 'source': 'test-no-fillval-01.nc', 'original_shape': (4,), 'dtype': dtype('float32'), 'missing_value': 1.0}

<xarray.Dataset>
Dimensions:  (x: 4)
Dimensions without coordinates: x
Data variables:
    test     (x) float32 0.0 nan nan 8.0
{'valid_min': 2.0, 'valid_max': 10.0}
{'zlib': False, 'szip': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False, 'contiguous': True, 'chunksizes': None, 'source': 'test-no-fillval-02.nc', 'original_shape': (4,), 'dtype': dtype('float32'), 'missing_value': 1.0, '_FillValue': nan}

File /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/variables.py:167, in CFMaskCoder.encode(self, variable, name)
    160 mv = encoding.get("missing_value")
    162 if (
    163     fv is not None
    164     and mv is not None
    165     and not duck_array_ops.allclose_or_equiv(fv, mv)
    166 ):
--> 167     raise ValueError(
    168         f"Variable {name!r} has conflicting _FillValue ({fv}) and missing_value ({mv}). Cannot encode data."
    169     )
    171 if fv is not None:
    172     # Ensure _FillValue is cast to same dtype as data's
    173     encoding["_FillValue"] = dtype.type(fv)

ValueError: Variable 'test' has conflicting _FillValue (nan) and missing_value (1.0). Cannot encode data.

Anything else we need to know?

The adding of _FillValue on write happens here:

xarray/xarray/conventions.py

Line 300 in d4db166

var = maybe_default_fill_value(var)

xarray/xarray/conventions.py

Lines 144 to 152 in d4db166

    
           def maybe_default_fill_value(var: Variable) -> Variable: 
        
               # make NaN the fill value for float types: 
        
               if ( 
        
                   "_FillValue" not in var.attrs 
        
                   and "_FillValue" not in var.encoding 
        
                   and np.issubdtype(var.dtype, np.floating) 
        
               ): 
        
                   var.attrs["_FillValue"] = var.dtype.type(np.nan) 
        
               return var

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.55-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.14.0 libnetcdf: 4.9.2

xarray: 2023.3.0
pandas: 1.5.3
numpy: 1.24.2
scipy: 1.10.1
netCDF4: 1.6.3
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.14.2
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2023.3.1
distributed: 2023.3.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.3.0
cupy: 11.6.0
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.6.0
pip: 23.0.1
conda: None
pytest: 7.2.2
mypy: None
IPython: 8.11.0
sphinx: None

The text was updated successfully, but these errors were encountered:

dcherian · 2023-04-06T02:26:41Z

how about not adding _FillValue when missing_value is present? Is that a good idea? Is it standards compliant?

kmuehlbauer · 2023-04-06T04:55:02Z

The recommendation is to use _FillValue if there is only one value describing missing/fillvalue.

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#missing-data

It's also written that missing_value is

This attribute is not treated in any special way by the library or conforming generic applications, but is often useful documentation and may be used by specific applications.

https://docs.unidata.ucar.edu/netcdf-c/current/attribute_conventions.html

Not sure, if xarray is a conforming generic application or a specific application.

Ockenfuss · 2023-06-21T13:23:16Z

There is also an older comment from Stephan Hoyer regarding this problem: Comment

kmuehlbauer added bug needs triage Issue that has not been reviewed by xarray team member labels Apr 5, 2023

kmuehlbauer mentioned this issue Apr 5, 2023

default fill_value not masked when read from file #7723

Closed

4 tasks

dcherian added topic-CF conventions and removed needs triage Issue that has not been reviewed by xarray team member labels Apr 6, 2023

jhamman mentioned this issue Sep 12, 2023

Cannot Save NetCDF: Conflicting _FillValue and Missing_Value #7191

Closed

maxrjones mentioned this issue Jan 23, 2024

No longer clear all encoding in set_zarr_encoding carbonplan/ndpyramid#90

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conflicting _FillValue and missing_value on write #7722

Conflicting _FillValue and missing_value on write #7722

kmuehlbauer commented Apr 5, 2023 •

edited

Loading

dcherian commented Apr 6, 2023

kmuehlbauer commented Apr 6, 2023

Ockenfuss commented Jun 21, 2023

Conflicting _FillValue and missing_value on write #7722

Conflicting _FillValue and missing_value on write #7722

Comments

kmuehlbauer commented Apr 5, 2023 • edited Loading

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

dcherian commented Apr 6, 2023

kmuehlbauer commented Apr 6, 2023

Ockenfuss commented Jun 21, 2023

kmuehlbauer commented Apr 5, 2023 •

edited

Loading