Skip to content

API: Inconsistent behavior with setting slices of Series indexed by MultiIndex #20414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dr-Irv opened this issue Mar 19, 2018 · 3 comments · Fixed by #41697
Closed

API: Inconsistent behavior with setting slices of Series indexed by MultiIndex #20414

Dr-Irv opened this issue Mar 19, 2018 · 3 comments · Fixed by #41697
Labels
API Design good first issue Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 19, 2018

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.22.0'

In [3]: s1 = pd.Series(range(8),
   ...:                index=pd.MultiIndex.from_product([list('ab'),
   ...:                                                  list('xy'),
   ...:                                                  [1,2]],
   ...:                                                  names=['ab','xy','num'])
   ...: )
   ...:

In [4]: s1
Out[4]:
ab  xy  num
a   x   1      0
        2      1
    y   1      2
        2      3
b   x   1      4
        2      5
    y   1      6
        2      7
dtype: int64

In [5]:

In [5]: s2 = pd.Series([100*(i+1) for i in range(4)],
   ...:                index=pd.MultiIndex.from_product([list('ab'),
   ...:                                                  list('xy')],
   ...:                                                  names=['ab','xy']))
   ...:

In [6]: s2
Out[6]:
ab  xy
a   x     100
    y     200
b   x     300
    y     400
dtype: int64

In [7]: s1.loc[pd.IndexSlice[:,:,1]] = -1  # This works as expected

In [8]: s1
Out[8]:
ab  xy  num
a   x   1     -1
        2      1
    y   1     -1
        2      3
b   x   1     -1
        2      5
    y   1     -1
        2      7
dtype: int64

In [9]: s3 = s1.loc[pd.IndexSlice[:,:,1]] + s2 # This works as expected

In [10]: s3
Out[10]:
ab  xy
a   x      99
    y     199
b   x     299
    y     399
dtype: int64

In [11]: s1.loc[pd.IndexSlice[:,:,1]] = s2 # This works differently in v0.22.0 and v0.23 (dev)

In [12]: s1
Out[12]:
ab  xy  num
a   x   1      NaN
        2      1.0
    y   1      NaN
        2      3.0
b   x   1      NaN
        2      5.0
    y   1      NaN
        2      7.0
dtype: float64

In [13]: s1.loc[pd.IndexSlice['a',:,:]] = -2 # This works as expected

In [14]: s1
Out[14]:
ab  xy  num
a   x   1     -2.0
        2     -2.0
    y   1     -2.0
        2     -2.0
b   x   1      NaN
        2      5.0
    y   1      NaN
        2      7.0
dtype: float64

In [15]: s4 = pd.Series([1000*i for i in range(1,5)], index=pd.MultiIndex.from_pr
    ...: oduct([list('xy'),[1,2]], names=['xy','num']))
    ...:

In [16]: s4
Out[16]:
xy  num
x   1      1000
    2      2000
y   1      3000
    2      4000
dtype: int64

In [17]: s5 = s1.loc[pd.IndexSlice['a',:,:]] + s4 # This fails
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-17-cc0933ef6ebd> in <module>()
----> 1 s5 = s1.loc[pd.IndexSlice['a',:,:]] + s4 # This fails

C:\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(left, right, name, na_op)
    716             return NotImplemented
    717
--> 718         left, right = _align_method_SERIES(left, right)
    719
    720         converted = _Op.get_op(left, right, name, na_op)

C:\Anaconda3\lib\site-packages\pandas\core\ops.py in _align_method_SERIES(left, right, align_asobject)
    645                 right = right.astype(object)
    646
--> 647             left, right = left.align(right, copy=False)
    648
    649     return left, right

C:\Anaconda3\lib\site-packages\pandas\core\series.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   2605                                          fill_value=fill_value, method=method,
   2606                                          limit=limit, fill_axis=fill_axis,

-> 2607                                          broadcast_axis=broadcast_axis)
   2608
   2609     def rename(self, index=None, **kwargs):

C:\Anaconda3\lib\site-packages\pandas\core\generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   5728                                       copy=copy, fill_value=fill_value,
   5729                                       method=method, limit=limit,
-> 5730                                       fill_axis=fill_axis)
   5731         else:  # pragma: no cover
   5732             raise TypeError('unsupported type: %s' % type(other))

C:\Anaconda3\lib\site-packages\pandas\core\generic.py in _align_series(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
   5797                 join_index, lidx, ridx = self.index.join(other.index, how=join,
   5798                                                          level=level,
-> 5799                                                          return_indexers=True)
   5800
   5801             left = self._reindex_indexer(join_index, lidx, copy)

C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in join(self, other, how, level, return_indexers, sort)
   3101             else:
   3102                 return self._join_multi(other, how=how,
-> 3103                                         return_indexers=return_indexers)
   3104
   3105         # join on the level

C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _join_multi(self, other, how, return_indexers)
   3199                              "overlapping names")
   3200         if len(overlap) > 1:
-> 3201             raise NotImplementedError("merging with more than one level "
   3202                                       "overlap on a multi-index is not "
   3203                                       "implemented")

NotImplementedError: merging with more than one level overlap on a multi-index is not implemented

In [18]: s1.loc[pd.IndexSlice['a',:,:]]  = s4 # This puts in NaN rather than s4 values

In [19]: s1
Out[19]:
ab  xy  num
a   x   1      NaN
        2      NaN
    y   1      NaN
        2      NaN
b   x   1      NaN
        2      5.0
    y   1      NaN
        2      7.0
dtype: float64

Problem description

This is a bit related to #10440 . In the above code, if we slice where we fix the value of the third level, then we can change the slice to a constant. We can also add that slice to a Series that has an Index that matches the first 2 levels.

In v0.22.0 of pandas, the result of the lines

s1.loc[pd.IndexSlice[:,:,1]] = s2
s1

is (as shown above)

ab  xy  num
a   x   1      NaN
        2      1.0
    y   1      NaN
        2      3.0
b   x   1      NaN
        2      5.0
    y   1      NaN
        2      7.0
dtype: float64

But in the development version 0.23 of pandas, the "correct" result is given:

ab  xy  num
a   x   1      100
        2        1
    y   1      200
        2        3
b   x   1      300
        2        5
    y   1      400
        2        7
dtype: int64

So I then would expect that the last 2 examples, using s4 , would work in v0.23 development, because the only difference is that I am fixing the value of the first level in the slice, as opposed to the last level of the slice. But in both of those cases, I get this error (independent of the pandas version):

NotImplementedError: merging with more than one level overlap on a multi-index is not implemented

So there is a bit of an inconsistency in that a slice that fixes the last level allows the addition and assignment operations to work (and it is better with v0.23 development version than in v0.22 because the NaN values go away), but a slice that fixes the first level does not allow the operations to work.

I'm not sure if this is a bug, or by design, or if the documentation needs to be clarified as to which type of slicing will allow the "setting" operation to work as expected. There is a line in the docs (http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers) that says "You can use a right-hand-side of an alignable object as well." At least to me, it's not clear what objects are considered "alignable".

In any case, the expected behavior should be clear in the documentation, and, IMHO, if you fix the value of the first index or the last index, the behavior should be consistent.

Expected Output

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.22.0'

In [3]: s1 = pd.Series(range(8),
   ...:                index=pd.MultiIndex.from_product([list('ab'),
   ...:                                                  list('xy'),
   ...:                                                  [1,2]],
   ...:                                                  names=['ab','xy','num'])
   ...: )
   ...:

In [4]: s1
Out[4]:
ab  xy  num
a   x   1      0
        2      1
    y   1      2
        2      3
b   x   1      4
        2      5
    y   1      6
        2      7
dtype: int64

In [5]:

In [5]: s2 = pd.Series([100*(i+1) for i in range(4)],
   ...:                index=pd.MultiIndex.from_product([list('ab'),
   ...:                                                  list('xy')],
   ...:                                                  names=['ab','xy']))
   ...:

In [6]: s2
Out[6]:
ab  xy
a   x     100
    y     200
b   x     300
    y     400
dtype: int64

In [7]: s1.loc[pd.IndexSlice[:,:,1]] = -1  # This works as expected

In [8]: s1
Out[8]:
ab  xy  num
a   x   1     -1
        2      1
    y   1     -1
        2      3
b   x   1     -1
        2      5
    y   1     -1
        2      7
dtype: int64

In [9]: s3 = s1.loc[pd.IndexSlice[:,:,1]] + s2 # This works as expected

In [10]: s3
Out[10]:
ab  xy
a   x      99
    y     199
b   x     299
    y     399
dtype: int64

In [11]: s1.loc[pd.IndexSlice[:,:,1]] = s2 # This works differently in v0.22.0 and v0.23 (dev)

In [12]: s1
Out[12]:
ab  xy  num
a   x   1      NaN
        2      1.0
    y   1      NaN
        2      3.0
b   x   1      NaN
        2      5.0
    y   1      NaN
        2      7.0
dtype: float64

In [13]: s1.loc[pd.IndexSlice['a',:,:]] = -2 # This works as expected

In [14]: s1
Out[14]:
ab  xy  num
a   x   1     -2.0
        2     -2.0
    y   1     -2.0
        2     -2.0
b   x   1      NaN
        2      5.0
    y   1      NaN
        2      7.0
dtype: float64

In [15]: s4 = pd.Series([1000*i for i in range(1,5)], index=pd.MultiIndex.from_pr
    ...: oduct([list('xy'),[1,2]], names=['xy','num']))
    ...:

In [16]: s4
Out[16]:
xy  num
x   1      1000
    2      2000
y   1      3000
    2      4000
dtype: int64

In [17]: s5 = s1.loc[pd.IndexSlice['a',:,:]] + s4 # This should not fail

In[18]: s5
Out[18]:
xy num
x  1   998
   2    1998
y  1    2998
    2    3998

In[19]: s1.loc[pd.IndexSlice['a',:,:]]  = s4 # This should not set NaN

In[20]: s1
ab  xy  num
a   x   1    1000
        2     2000
    y   1    3000
        2     4000
b   x   1      NaN
        2      5.0
    y   1      NaN
        2      7.0
dtype: float64

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.22.0
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 20, 2018

rather than showing prints, can you show the ipython output inline, its much easier to read

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 20, 2018

@jreback I updated the original post using ipython.

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves API Design MultiIndex labels Jan 13, 2019
@phofl
Copy link
Member

phofl commented Nov 11, 2020

This seems to work now. ``s5``` equals

ab  xy  num
a   x   1       998
        2      1998
    y   1      2998
        2      3998
dtype: int64

@phofl phofl added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Nov 11, 2020
@mroeschke mroeschke mentioned this issue May 28, 2021
10 tasks
@mroeschke mroeschke added this to the 1.3 milestone May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design good first issue Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants