to_dataframe/to_series fails when one out of more than one dims are stacked / multiindex #3008

gmoutso · 2019-06-10T10:39:22Z

Code Sample, a copy-pastable example if possible

da = xr.DataArray([[[1]]], dims=["a","b","c"]).stack(ab=["a", "b"])
da.to_series()
# or
da.to_dataframe("A")

Problem description

When a dataarray has one multiindex dimension, as produced by stack, and has other dimesnions as well, to_series fails to create an combined multiindex.

I would expect a series/dataframe with a multiindex with names a,b,c. Instead I get

lib/python2.7/site-packages/pandas/core/dtypes/missing.pyc in _isna_new(obj) 115 # hack (for now) because MI registers as ndarray 116 elif isinstance(obj, ABCMultiIndex): --> 117 raise NotImplementedError("isna is not defined for MultiIndex") 118 elif isinstance(obj, (ABCSeries, np.ndarray, ABCIndexClass, 119 ABCExtensionArray)):

NotImplementedError: isna is not defined for MultiIndex

On the other hand, when there is only one dimension, which is stacked, to_series and to_dataframe work

da.isel(c=0).to_series()

Output of `xr.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.15 |Anaconda, Inc.| (default, May 1 2018, 23:32:55) [GCC 7.2.0] python-bits: 64 OS: Linux OS-release: 3.13.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: None.None libhdf5: 1.8.17 libnetcdf: 4.4.1

xarray: 0.11.3
pandas: 0.23.4
numpy: 1.12.1
scipy: 0.19.1
netCDF4: 1.2.8
pydap: None
h5netcdf: None
h5py: 2.6.0
Nio: None
zarr: None
cftime: None
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.2.0
cyordereddict: None
dask: 0.17.3
distributed: 1.21.0
matplotlib: 2.2.2
cartopy: None
seaborn: 0.7.1
setuptools: 0.6
pip: 19.0.1
conda: None
pytest: 3.0.5
IPython: 5.8.0
sphinx: 1.5.1

The text was updated successfully, but these errors were encountered:

shoyer · 2019-06-23T23:18:08Z

I agree, this is definitely not ideal behavior!

I hesitate to call it a bug only because I'm not sure if we've ever supported this behavior.

It would be nice to fix this, and I would encourage you (or other interested users) to look into it.

max-sixty · 2020-02-26T14:25:43Z

This seems to happen because MultiIndex.from_product is being passed an index and a MultiIndex, and doesn't handle this well.

The pandas error isn't great but I think it's mostly on us)

> /home/mroos/.local/lib/python3.7/site-packages/xarray/core/coordinates.py(111)to_index()
    109             indexes = [self._data.get_index(k) for k in ordered_dims]  # type: ignore
    110             names = list(ordered_dims)
--> 111             return pd.MultiIndex.from_product(indexes, names=names)
    112 
    113     def update(self, other: Mapping[Hashable, Any]) -> None:

ipdb> indexes
[Index(['0', '1', '2', '3'], dtype='object', name='n'), MultiIndex([(    18671, '1995-03-31'),
            (    18671, '1995-06-30'),
            (    18671, '1995-09-30'),
            (    18671, '1995-12-31'),
            (    18671, '1996-03-31'),
            (    18671, '1996-06-30'),
            (    18671, '1996-09-30'),
            (    18671, '1996-12-31'),
            (    18671, '1997-03-31'),
            (    18671, '1997-06-30'),
            ...
            (634127183, '2012-09-30'),
            (634127183, '2012-12-31'),
            (634127183, '2013-03-31'),
            (634127183, '2013-06-30'),
            (634127183, '2013-09-30'),
            (634127183, '2013-12-31'),
            (634127183, '2014-03-31'),
            (634127183, '2014-06-30'),
            (634127183, '2014-09-30'),
            (634127183, '2014-12-31')],
           names=['c', 'date'], length=201040)]

Here's the whole stack trace for reference:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-698-952a54d66d1c> in <module>
----> 1 observations.assign_coords(n=['0','1','2','3']).to_dataframe()

~/.local/lib/python3.7/site-packages/xarray/core/dataset.py in to_dataframe(self)
   4463         this dataset's indices.
   4464         """
-> 4465         return self._to_dataframe(self.dims)
   4466 
   4467     def _set_sparse_data_from_dataframe(

~/.local/lib/python3.7/site-packages/xarray/core/dataset.py in _to_dataframe(self, ordered_dims)
   4453             for k in columns
   4454         ]
-> 4455         index = self.coords.to_index(ordered_dims)
   4456         return pd.DataFrame(dict(zip(columns, data)), index=index)
   4457 

~/.local/lib/python3.7/site-packages/xarray/core/coordinates.py in to_index(self, ordered_dims)
    109             indexes = [self._data.get_index(k) for k in ordered_dims]  # type: ignore
    110             names = list(ordered_dims)
--> 111             return pd.MultiIndex.from_product(indexes, names=names)
    112 
    113     def update(self, other: Mapping[Hashable, Any]) -> None:

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/indexes/multi.py in from_product(cls, iterables, sortorder, names)
    536             iterables = list(iterables)
    537 
--> 538         codes, levels = _factorize_from_iterables(iterables)
    539         codes = cartesian_product(codes)
    540         return MultiIndex(levels, codes, sortorder=sortorder, names=names)

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in _factorize_from_iterables(iterables)
   2814         # For consistency, it should return a list of 2 lists.
   2815         return [[], []]
-> 2816     return map(list, zip(*(_factorize_from_iterable(it) for it in iterables)))

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in <genexpr>(.0)
   2814         # For consistency, it should return a list of 2 lists.
   2815         return [[], []]
-> 2816     return map(list, zip(*(_factorize_from_iterable(it) for it in iterables)))

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in _factorize_from_iterable(values)
   2786         # but only the resulting categories, the order of which is independent
   2787         # from ordered. Set ordered to False as default. See GH #15457
-> 2788         cat = Categorical(values, ordered=False)
   2789         categories = cat.categories
   2790         codes = cat.codes

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
    401 
    402             # we're inferring from values
--> 403             dtype = CategoricalDtype(categories, dtype._ordered)
    404 
    405         elif is_categorical_dtype(values):

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py in __init__(self, categories, ordered)
    224 
    225     def __init__(self, categories=None, ordered: OrderedType = ordered_sentinel):
--> 226         self._finalize(categories, ordered, fastpath=False)
    227 
    228     @classmethod

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py in _finalize(self, categories, ordered, fastpath)
    345 
    346         if categories is not None:
--> 347             categories = self.validate_categories(categories, fastpath=fastpath)
    348 
    349         self._categories = categories

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/dtypes/dtypes.py in validate_categories(categories, fastpath)
    521         if not fastpath:
    522 
--> 523             if categories.hasnans:
    524                 raise ValueError("Categorial categories cannot be null")
    525 

pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/indexes/base.py in hasnans(self)
   1958         """
   1959         if self._can_hold_na:
-> 1960             return bool(self._isnan.any())
   1961         else:
   1962             return False

pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/indexes/base.py in _isnan(self)
   1937         """
   1938         if self._can_hold_na:
-> 1939             return isna(self)
   1940         else:
   1941             # shouldn't reach to this condition by checking hasnans beforehand

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in isna(obj)
    120     Name: 1, dtype: bool
    121     """
--> 122     return _isna(obj)
    123 
    124 

/j/office/app/research-python/conda/envs/2019.10/lib/python3.7/site-packages/pandas/core/dtypes/missing.py in _isna_new(obj)
    131     # hack (for now) because MI registers as ndarray
    132     elif isinstance(obj, ABCMultiIndex):
--> 133         raise NotImplementedError("isna is not defined for MultiIndex")
    134     elif isinstance(obj, type):
    135         return False

NotImplementedError: isna is not defined for MultiIndex

ghislainp · 2020-09-20T20:49:27Z

The proposed PR completely rewrite how the Cartesian product is computed, MultiIndex.from_product is unable to deal with MultiIndex which was written for any iterables.

ghislainp added a commit to ghislainp/xarray that referenced this issue Sep 20, 2020

Accept coordinates with MultiIndex (solve issue pydata#3008)

762c961

ghislainp mentioned this issue Sep 20, 2020

Fix DataArray.to_dataframe when the array has MultiIndex #4442

Merged

4 tasks

ghislainp mentioned this issue Sep 20, 2020

MultiIndex.from_product could accept MultiIndex pandas-dev/pandas#36509

Closed

dcherian closed this as completed in #4442 Feb 20, 2021

MgeeeeK mentioned this issue Apr 13, 2021

Fixed index2da causing inverted output pysal/libpysal#400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_dataframe/to_series fails when one out of more than one dims are stacked / multiindex #3008

to_dataframe/to_series fails when one out of more than one dims are stacked / multiindex #3008

gmoutso commented Jun 10, 2019

shoyer commented Jun 23, 2019

max-sixty commented Feb 26, 2020

ghislainp commented Sep 20, 2020

to_dataframe/to_series fails when one out of more than one dims are stacked / multiindex #3008

to_dataframe/to_series fails when one out of more than one dims are stacked / multiindex #3008

Comments

gmoutso commented Jun 10, 2019

Code Sample, a copy-pastable example if possible

Problem description

Output of xr.show_versions()

shoyer commented Jun 23, 2019

max-sixty commented Feb 26, 2020

ghislainp commented Sep 20, 2020

Output of `xr.show_versions()`