You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried using the new multi-dimensional grouping added in #9372, with one BinGrouper per dimension. I'm using version 2024.09.0. If I construct the BinGrouper such that some bins end up empty, I get an IndexError:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[9], line 1
----> 1 ds.groupby(x=BinGrouper(np.arange(0,13,4)), y=BinGrouper(bins=np.arange(0,16,2)))
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/util/deprecation_helpers.py:118, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
114 kwargs.update({name: arg for name, arg in zip_args})
116 return func(*args[:-n_extra_args], **kwargs)
--> 118 return func(*args, **kwargs)
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/core/dataset.py:10444, in Dataset.groupby(self, group, squeeze, restore_coord_dims, **groupers)
10441 _validate_groupby_squeeze(squeeze)
10442 rgroupers = _parse_group_and_groupers(self, group, groupers)
> 10444 return DatasetGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/core/groupby.py:581, in GroupBy.__init__(self, obj, groupers, restore_coord_dims)
573 if any(
574 isinstance(obj._indexes.get(grouper.name, None), PandasMultiIndex)
575 for grouper in groupers
576 ):
577 raise NotImplementedError(
578 "Grouping by multiple variables, one of which "
579 "wraps a Pandas MultiIndex, is not supported yet."
580 )
--> 581 self.encoded = ComposedGrouper(groupers).factorize()
583 # specification for the groupby operation
584 # TODO: handle obj having variables that are not present on any of the groupers
585 # simple broadcasting fails for ExtensionArrays.
586 (self.group1d, self._obj, self._stacked_dim, self._inserted_dims) = _ensure_1d(
587 group=self.encoded.codes, obj=obj
588 )
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/xarray/core/groupby.py:470, in ComposedGrouper.factorize(self)
464 midx = pd.MultiIndex.from_product(
465 (grouper.unique_coord.data for grouper in groupers),
466 names=tuple(grouper.name for grouper in groupers),
467 )
468 # Constructing an index from the product is wrong when there are missing groups
469 # (e.g. binning, resampling). Account for that now.
--> 470 midx = midx[np.sort(pd.unique(_flatcodes[~mask]))]
472 full_index = pd.MultiIndex.from_product(
473 (grouper.full_index.values for grouper in groupers),
474 names=tuple(grouper.name for grouper in groupers),
475 )
476 dim_name = "stacked_" + "_".join(str(grouper.name) for grouper in groupers)
File /home/me/.conda/envs/xarray_2024.09/lib/python3.12/site-packages/pandas/core/indexes/multi.py:2207, in MultiIndex.__getitem__(self, key)
2204 elif isinstance(key, Index):
2205 key = np.asarray(key)
-> 2207 new_codes = [level_codes[key] for level_codes in self.codes]
2209 return MultiIndex(
2210 levels=self.levels,
2211 codes=new_codes,
(...)
2214 verify_integrity=False,
2215 )
IndexError: index 18 is out of bounds for axis 0 with size 18
What did you expect to happen?
It should work, even if some bins are empty, just like it works correctly for a single dimension.
Uh oh!
There was an error while loading. Please reload this page.
What happened?
I tried using the new multi-dimensional grouping added in #9372, with one
BinGrouper
per dimension. I'm using version 2024.09.0. If I construct theBinGrouper
such that some bins end up empty, I get anIndexError
:What did you expect to happen?
It should work, even if some bins are empty, just like it works correctly for a single dimension.
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
If we make sure that no bins are empty, it works, e.g.
Also, if we give the same bins as above, but only for a single dimension, it also works:
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.7 | packaged by conda-forge | (main, Oct 4 2024, 16:05:46) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 4.18.0-372.9.1.el8.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.4
libnetcdf: 4.9.2
xarray: 2024.9.0
pandas: 2.2.3
numpy: 2.1.2
scipy: 1.14.1
netCDF4: 1.7.1
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.9.1
distributed: 2024.9.1
matplotlib: 3.9.2
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.9.0
cupy: None
pint: None
sparse: None
flox: 0.9.12
numpy_groupies: 0.11.2
setuptools: 75.1.0
pip: 24.2
conda: None
pytest: None
mypy: None
IPython: 8.28.0
sphinx: None
The text was updated successfully, but these errors were encountered: