You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a data array of datetime values and I want to get the first value for each group. If there are any missing groups, the operation fails as numpy can't promote datetime data to float.
This is new in xarray 2025.3.
What did you expect to happen?
I expected to receive the first value for each group and NaT for missing groups.
Minimal Complete Verifiable Example
importxarrayasxr# A datetime arraytime=xr.DataArray(xr.date_range('2000-01-01', periods=400, freq='D'), dims=('time',))
# Remove december, so there is a missing grouptime=time.sel(time=time.dt.month!=12)
time.resample(time='MS').first()
MVCE confirmation
Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
Complete example — the example is self-contained, including all data and the text of any traceback.
Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
New issue — a search of GitHub Issues suggests this is not a duplicate.
Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
---------------------------------------------------------------------------DTypePromotionErrorTraceback (mostrecentcalllast)
CellIn[41], line1---->1time.resample(time='MS').first()
File~/Projets/xarray/xarray/core/groupby.py:1420, inGroupBy.first(self, skipna, keep_attrs)
1401deffirst(
1402self, skipna: bool|None=None, keep_attrs: bool|None=None1403 ) ->T_Xarray:
1404""" 1405 Return the first element of each group along the group dimension 1406 (...) 1418 1419 """->1420returnself._first_or_last("first", skipna, keep_attrs)
File~/Projets/xarray/xarray/core/resample.py:114, inResample._first_or_last(self, op, skipna, keep_attrs)
109def_first_or_last(
110self, op: Literal["first", "last"], skipna: bool|None, keep_attrs: bool|None111 ) ->T_Xarray:
112fromxarray.core.datasetimportDataset-->114result=super()._first_or_last(op=op, skipna=skipna, keep_attrs=keep_attrs)
115ifisinstance(result, Dataset):
116# Can't do this in the base class because group_dim is RESAMPLE_DIM117# which is not present in the original object118forvarinresult.data_vars:
File~/Projets/xarray/xarray/core/groupby.py:1389, inGroupBy._first_or_last(self, op, skipna, keep_attrs)
1383keep_attrs=_get_keep_attrs(default=True)
1384if (
1385module_available("flox", minversion="0.10.0")
1386andOPTIONS["use_flox"]
1387andcontains_only_chunked_or_numpy(self._obj)
1388 ):
->1389result=self._flox_reduce(
1390dim=None, func=op, skipna=skipna, keep_attrs=keep_attrs1391 )
1392else:
1393result=self.reduce(
1394getattr(duck_array_ops, op),
1395dim=[self._group_dim],
1396skipna=skipna,
1397keep_attrs=keep_attrs,
1398 )
File~/Projets/xarray/xarray/core/resample.py:59, inResample._flox_reduce(self, dim, keep_attrs, **kwargs)
52def_flox_reduce(
53self,
54dim: Dims,
55keep_attrs: bool|None=None,
56**kwargs,
57 ) ->T_Xarray:
58result: T_Xarray= (
--->59super()
60 ._flox_reduce(dim=dim, keep_attrs=keep_attrs, **kwargs)
61 .rename({RESAMPLE_DIM: self._group_dim}) # type: ignore[assignment]62 )
63returnresultFile~/Projets/xarray/xarray/core/groupby.py:1099, inGroupBy._flox_reduce(self, dim, keep_attrs, **kwargs)
1097fromIPythonimportembed1098embed()
->1099result=xarray_reduce(
1100obj.drop_vars(non_numeric.keys()),
1101*codes,
1102dim=parsed_dim,
1103expected_groups=expected_groups,
1104isbin=False,
1105keep_attrs=keep_attrs,
1106**kwargs,
1107 )
1109# we did end up reducing over dimension(s) that are1110# in the grouped variable1111group_dims=set(grouper.group.dims)
File~/miniforge3/envs/xclim-dev/lib/python3.13/site-packages/flox/xarray.py:410, inxarray_reduce(obj, func, expected_groups, isbin, sort, dim, fill_value, dtype, method, engine, keep_attrs, skipna, min_count, reindex, *by, **finalize_kwargs)
407output_sizes=group_sizes408output_sizes.update({dim.name: dim.sizefordiminnewdimsifdim.size!=0})
-->410actual=xr.apply_ufunc(
411wrapper,
412ds_broad.drop_vars(tuple(missing_dim)).transpose(..., *grouper_dims),
413*by_da,
414input_core_dims=input_core_dims,
415# for xarray's test_groupby_duplicate_coordinate_labels416exclude_dims=set(dim_tuple),
417output_core_dims=[output_core_dims],
418dask="allowed",
419dask_gufunc_kwargs=dict(
420output_sizes=output_sizes,
421output_dtypes=[dtype] ifdtypeisnotNoneelseNone,
422 ),
423keep_attrs=keep_attrs,
424kwargs={
425"func": func,
426"axis": axis,
427"sort": sort,
428"fill_value": fill_value,
429"method": method,
430"min_count": min_count,
431"skipna": skipna,
432"engine": engine,
433"reindex": reindex,
434"expected_groups": tuple(expected_groups_valid_list),
435"isbin": isbins,
436"finalize_kwargs": finalize_kwargs,
437"dtype": dtype,
438"core_dims": input_core_dims,
439 },
440 )
442# restore non-dim coord variables without the core dimension443# TODO: shouldn't apply_ufunc handle this?444forvarinset(ds_broad._coord_names) -set(ds_broad._indexes) -set(ds_broad.dims):
File~/Projets/xarray/xarray/computation/apply_ufunc.py:1255, inapply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, on_missing_core_dim, *args)
1253# feed datasets apply_variable_ufunc through apply_dataset_vfunc1254elifany(is_dict_like(a) forainargs):
->1255returnapply_dataset_vfunc(
1256variables_vfunc,
1257*args,
1258signature=signature,
1259join=join,
1260exclude_dims=exclude_dims,
1261dataset_join=dataset_join,
1262fill_value=dataset_fill_value,
1263keep_attrs=keep_attrs,
1264on_missing_core_dim=on_missing_core_dim,
1265 )
1266# feed DataArray apply_variable_ufunc through apply_dataarray_vfunc1267elifany(isinstance(a, DataArray) forainargs):
File~/Projets/xarray/xarray/computation/apply_ufunc.py:526, inapply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, on_missing_core_dim, *args)
521list_of_coords, list_of_indexes=build_output_coords_and_indexes(
522args, signature, exclude_dims, combine_attrs=keep_attrs523 )
524args=tuple(getattr(arg, "data_vars", arg) forarginargs)
-->526result_vars=apply_dict_of_variables_vfunc(
527func,
528*args,
529signature=signature,
530join=dataset_join,
531fill_value=fill_value,
532on_missing_core_dim=on_missing_core_dim,
533 )
535out: Dataset|tuple[Dataset, ...]
536ifsignature.num_outputs>1:
File~/Projets/xarray/xarray/computation/apply_ufunc.py:450, inapply_dict_of_variables_vfunc(func, signature, join, fill_value, on_missing_core_dim, *args)
448core_dim_present=_check_core_dims(signature, variable_args, name)
449ifcore_dim_presentisTrue:
-->450result_vars[name] =func(*variable_args)
451else:
452ifon_missing_core_dim=="raise":
File~/Projets/xarray/xarray/computation/apply_ufunc.py:821, inapply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
816ifvectorize:
817func=_vectorize(
818func, signature, output_dtypes=output_dtypes, exclude_dims=exclude_dims819 )
-->821result_data=func(*input_data)
823ifsignature.num_outputs==1:
824result_data= (result_data,)
File~/miniforge3/envs/xclim-dev/lib/python3.13/site-packages/flox/xarray.py:367, inxarray_reduce.<locals>.wrapper(array, func, skipna, core_dims, *by, **kwargs)
364if"nan"notinfuncandfuncnotin ["all", "any", "count"]:
365func=f"nan{func}"-->367result, *groups=groupby_reduce(array, *by, func=func, **kwargs)
369# Transpose the new quantile dimension to the end. This is ugly.370# but new core dimensions are expected at the end :/371# but groupby_reduce inserts them at the beginning372iffuncin ["quantile", "nanquantile"]:
File~/miniforge3/envs/xclim-dev/lib/python3.13/site-packages/flox/core.py:2559, ingroupby_reduce(array, func, expected_groups, sort, isbin, axis, fill_value, dtype, min_count, method, engine, reindex, finalize_kwargs, *by)
2556fill_value=np.nan2558kwargs=dict(axis=axis_, fill_value=fill_value)
->2559agg=_initialize_aggregation(func, dtype, array.dtype, fill_value, min_count_, finalize_kwargs)
2561# Need to set this early using `agg`2562# It cannot be done in the core loop of chunk_reduce2563# since we "prepare" the data for flox.2564kwargs["engine"] =_choose_engine(by_, agg) ifengineisNoneelseengineFile~/miniforge3/envs/xclim-dev/lib/python3.13/site-packages/flox/aggregations.py:809, in_initialize_aggregation(func, dtype, array_dtype, fill_value, min_count, finalize_kwargs)
804# np.dtype(None) == np.dtype("float64")!!!805# so check for not None806dtype_: np.dtype|None= (
807np.dtype(dtype) ifdtypeisnotNoneandnotisinstance(dtype, np.dtype) elsedtype808 )
-->809final_dtype=dtypes._normalize_dtype(
810dtype_oragg.dtype_init["final"], array_dtype, agg.preserves_dtype, fill_value811 )
812agg.dtype= {
813"user": dtype, # Save to automatically choose an engine814"final": final_dtype,
(...) 823 ),
824 }
826# Replace sentinel fill values according to dtypeFile~/miniforge3/envs/xclim-dev/lib/python3.13/site-packages/flox/xrdtypes.py:171, in_normalize_dtype(dtype, array_dtype, preserves_dtype, fill_value)
169dtype=np.dtype(dtype)
170iffill_valuenotin [None, INF, NINF, NA]:
-->171dtype=np.result_type(dtype, fill_value)
172returndtypeDTypePromotionError: TheDType<class'numpy.dtypes.DateTime64DType'>couldnotbepromotedby<class'numpy.dtypes._PyFloatDType'>. ThismeansthatnocommonDTypeexistsforthegiveninputs. Forexampletheycannotbestoredinasinglearrayunlessthedtypeis`object`. ThefulllistofDTypesis: (<class'numpy.dtypes.DateTime64DType'>, <class'numpy.dtypes._PyFloatDType'>)
Anything else we need to know?
This is was introduced by #10148 I believe, but the reason is that the _flox_reduce method assumes np.nan as a fill value when groups are missing:
Uh oh!
There was an error while loading. Please reload this page.
What happened?
I have a data array of datetime values and I want to get the first value for each group. If there are any missing groups, the operation fails as numpy can't promote datetime data to float.
This is new in xarray 2025.3.
What did you expect to happen?
I expected to receive the first value for each group and NaT for missing groups.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
This is was introduced by #10148 I believe, but the reason is that the
_flox_reduce
method assumesnp.nan
as a fill value when groups are missing:xarray/xarray/core/groupby.py
Line 1105 in 4174aa1
It also fails for cftime data, but I think this is another issue internal to flox.
Deactivating flox fixes both issues (
xr.set_options(use_flox=False)
).Environment
INSTALLED VERSIONS
commit: fd7c765
python: 3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:10:22) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 6.12.12-200.fc41.x86_64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: fr_CA.UTF-8
LOCALE: ('fr_CA', 'UTF-8')
libhdf5: 1.14.4
libnetcdf: 4.9.2
xarray: 2025.3.1.dev5+gfd7c7656.d20250324
pandas: 2.2.3
numpy: 2.1.3
scipy: 1.15.2
netCDF4: 1.7.2
pydap: None
h5netcdf: 1.6.1
h5py: 3.12.1
zarr: None
cftime: 1.6.4
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.4.2
dask: 2025.3.0
distributed: 2025.3.0
matplotlib: 3.10.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.3.0
cupy: None
pint: 0.24.4
sparse: None
flox: 0.10.0
numpy_groupies: 0.11.2
setuptools: 75.8.2
pip: 25.0.1
conda: None
pytest: 8.3.5
mypy: 1.15.0
IPython: 9.0.2
sphinx: 8.1.
The text was updated successfully, but these errors were encountered: