Skip to content

Relying on pandas time coordinate being in "datetime64[ns]" format #151

@JanStreffing

Description

@JanStreffing
  • pyfesom2 version: 0.2.0 dev_0
  • Python version: 3.9.6
  • Operating System: centos-linux-release-8.4-1.2105.el8.noarch

Description

We are relying on xr.open_mfdataset here

dataset = xr.open_mfdataset(paths, combine="by_coords", **kwargs)
which by default casts the time coordinate in datetime64[ns] format. However, owning up to its nanosecond accuracy, this format is limited in its valid (no overflow / underflow) range. It's centered around 1970-01-01 and valid for ~ +- 300 years from there. See:
pydata/xarray#4454 (comment)

Within this range we load data as:

Coordinates:
    time    (time)    datetime64[ns]    2000-01-31T23:20:00...

Outside of this range xarray defaults back to cftime objects.

Coordinates:
    time    (time)    object    2270-01-31 23:20:00...

When attempting load a timeseries that contains values from both cases we fail with:

TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_24152/1386845663.py in <module>
      5 
      6     a_ice[exp_name] = {}
----> 7     a_ice[exp_name]['data'] = pf.get_data(exp_path, 'a_ice', years, mesh, how=None, compute=False, silent=True)

/p/project/chhb19/jstreffi/software/pyfesom2/pyfesom2/load_mesh_data.py in get_data(result_path, variable, years, mesh, runid, records, depth, how, ncfile, compute, continuous, silent, **kwargs)
    518             print("Depth is None, 3d field will be returned")
    519 
--> 520     dataset = xr.open_mfdataset(paths, combine="by_coords", **kwargs)
    521     data = select_slices(dataset, variable, mesh, records, depth)
    522 

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
    939             # Redo ordering from coordinates, ignoring how they were ordered
    940             # previously
--> 941             combined = combine_by_coords(
    942                 datasets,
    943                 compat=compat,

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in combine_by_coords(data_objects, compat, data_vars, coords, fill_value, join, combine_attrs, datasets)
    896         concatenated_grouped_by_data_vars = []
    897         for vars, datasets_with_same_vars in grouped_by_vars:
--> 898             concatenated = _combine_single_variable_hypercube(
    899                 list(datasets_with_same_vars),
    900                 fill_value=fill_value,

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in _combine_single_variable_hypercube(datasets, fill_value, data_vars, coords, compat, join, combine_attrs)
    602         )
    603 
--> 604     combined_ids, concat_dims = _infer_concat_order_from_coords(list(datasets))
    605 
    606     if fill_value is None:

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in _infer_concat_order_from_coords(datasets)
    109 
    110                 # ensure series does not contain mixed types, e.g. cftime calendars
--> 111                 _ensure_same_types(series, dim)
    112 
    113                 # Sort datasets along dim

/p/project/chhb19/jstreffi/software/miniconda3/envs/pyfesom2/lib/python3.9/site-packages/xarray/core/combine.py in _ensure_same_types(series, dim)
     52         if len(types) > 1:
     53             types = ", ".join(t.__name__ for t in types)
---> 54             raise TypeError(
     55                 f"Cannot combine along dimension '{dim}' with mixed types."
     56                 f" Found: {types}."

TypeError: Cannot combine along dimension 'time' with mixed types. Found: DatetimeGregorian, Timestamp.

I suggest we avoid datetime64[ns] altogether, as we don't need it's accuracy. This might mean modifications to diagnostics that use structures such as:

toplot = value[h].sel(time=value[h].time.dt.month.isin([month]))

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions