-
Notifications
You must be signed in to change notification settings - Fork 42
Description
I've found a delightful edge case that is a little hard to believe. It involves a netcdf time:units
that includes a character outside of the [0-9,-] range. If it's not obvious from the below, the issue is that the time:units = "days since 20O1-1-1"
whereas this should be time:units = "days since 2001-1-1"
(so replacing the rogue "O" (oooh), with the numeral "0" zero).
The file is a 297MiB file downloadable from here
Below is the example reproducing the error:
Python 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray as xr
>>> xr.__version__
'2022.6.0'
>>> import cftime
>>> import cftime as cft
>>> cft.__version__
'1.6.1'
>>> ds = xr.open_dataset("/p/css03/esgf_publish/cmip3/ipcc/data3/sresa2/ice/mo/sic/ingv_echam4/run1/sic_O1.nc")
Traceback (most recent call last):
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 270, in decode_cf_datetime
dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 207, in _decode_datetime_with_pandas
raise OutOfBoundsDatetime(
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Cannot decode times from a non-standard calendar, '360_day', using pandas.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 180, in _decode_cf_datetime_dtype
result = decode_cf_datetime(example_value, units, calendar, use_cftime)
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 272, in decode_cf_datetime
dates = _decode_datetime_with_cftime(
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 201, in _decode_datetime_with_cftime
cftime.num2date(num_dates, units, calendar, only_use_cftime_datetimes=True)
File "src/cftime/_cftime.pyx", line 549, in cftime._cftime.num2date
File "src/cftime/_cftime.pyx", line 107, in cftime._cftime._dateparse
File "src/cftime/_cftime.pyx", line 750, in cftime._cftime._parse_date
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/backends/api.py", line 531, in open_dataset
backend_ds = backend.open_dataset(
File "~mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 569, in open_dataset
ds = store_entrypoint.open_dataset(
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/backends/store.py", line 29, in open_dataset
vars, attrs, coord_names = conventions.decode_cf_variables(
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/conventions.py", line 521, in decode_cf_variables
new_vars[k] = decode_cf_variable(
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/conventions.py", line 369, in decode_cf_variable
var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 682, in decode
dtype = _decode_cf_datetime_dtype(data, units, calendar, self.use_cftime)
File "~/mambaforge/envs/xcd031spy532mat353/lib/python3.10/site-packages/xarray/coding/times.py", line 190, in _decode_cf_datetime_dtype
raise ValueError(msg)
ValueError: unable to decode time units 'days since 20O1-1-1' with "calendar '360_day'". Try opening your dataset with decode_times=False or installing cftime if it is not installed.
I wonder if a regex check would be useful to implement? This problem tripped me up for a while, and it was not at all obvious that an incorrect character (which looks almost identical, depending on fonts) was the root cause. Testing for a datestring that matches regex r"(?:[0-9][0-9])?[0-9][0-9]-(?:[0-1])?[0-9]-(?:[0-3])?[0-9]"
could be a useful test to catch such a fringe case - and point out the issue obviously in the error message. It seems in the CF Conventions docs that there is little leeway in this format, so using "/"
or alternative MM-DD-YYYY
formats to the standard [YYY]Y-[M]M-[D]D HH:MM:SS.ss [-]0:00
And just because pydata/xarray#7144