Skip to content

Dealing with netcdf time:units with unexpected characters #296

@durack1

Description

@durack1

I've found a delightful edge case that is a little hard to believe. It involves a netcdf time:units that includes a character outside of the [0-9,-] range. If it's not obvious from the below, the issue is that the time:units = "days since 20O1-1-1" whereas this should be time:units = "days since 2001-1-1" (so replacing the rogue "O" (oooh), with the numeral "0" zero).

The file is a 297MiB file downloadable from here

Below is the example reproducing the error:

Python 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray as xr
>>> xr.__version__
'2022.6.0'
>>> import cftime
>>> import cftime as cft
>>> cft.__version__
'1.6.1'
>>> ds = xr.open_dataset("/p/css03/esgf_publish/cmip3/ipcc/data3/sresa2/ice/mo/sic/ingv_echam4/run1/sic_O1.nc")
Traceback (most recent call last):
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 270, in decode_cf_datetime
    dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 207, in _decode_datetime_with_pandas
    raise OutOfBoundsDatetime(
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Cannot decode times from a non-standard calendar, '360_day', using pandas.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 180, in _decode_cf_datetime_dtype
    result = decode_cf_datetime(example_value, units, calendar, use_cftime)
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 272, in decode_cf_datetime
    dates = _decode_datetime_with_cftime(
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 201, in _decode_datetime_with_cftime
    cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True)
  File "src/cftime/_cftime.pyx", line 549, in cftime._cftime.num2date
  File "src/cftime/_cftime.pyx", line 107, in cftime._cftime._dateparse
  File "src/cftime/_cftime.pyx", line 750, in cftime._cftime._parse_date
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/backends/api.py", line 531, in open_dataset
    backend_ds = backend.open_dataset(
  File "~mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 569, in open_dataset
    ds = store_entrypoint.open_dataset(
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/backends/store.py", line 29, in open_dataset
    vars, attrs, coord_names = conventions.decode_cf_variables(
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/conventions.py", line 521, in decode_cf_variables
    new_vars[k] = decode_cf_variable(
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/conventions.py", line 369, in decode_cf_variable
    var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 682, in decode
    dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
  File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 190, in _decode_cf_datetime_dtype
    raise ValueError(msg)
ValueError: unable to decode time units 'days since 20O1-1-1' with "calendar '360_day'". Try opening your dataset with decode_times=False or installing cftime if it is not installed.

I wonder if a regex check would be useful to implement? This problem tripped me up for a while, and it was not at all obvious that an incorrect character (which looks almost identical, depending on fonts) was the root cause. Testing for a datestring that matches regex r"(?:[0-9][0-9])?[0-9][0-9]-(?:[0-1])?[0-9]-(?:[0-3])?[0-9]" could be a useful test to catch such a fringe case - and point out the issue obviously in the error message. It seems in the CF Conventions docs that there is little leeway in this format, so using "/" or alternative MM-DD-YYYY formats to the standard [YYY]Y-[M]M-[D]D HH:MM:SS.ss [-]0:00

And just because pydata/xarray#7144

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions