BUG: NBC change in reindexing logic of Series.setitem in pandas=1.1.0 #37427

kozlov-alexey · 2020-10-26T19:33:28Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Hi,

The following example of using Series.__setitem__ shows different results on pandas=1.0.5 and pandas=1.1.3:

import pandas as pd
import numpy as np

S1 = pd.Series([0, 1, -12, 3, -10, 5, 6, 7, 8, -11, -13], dtype=np.int32)
idx = pd.Series([4, 9, 2, 10], dtype=np.int64)
value = pd.Series([-10, -11, -12, -13], dtype=np.int32)
S1[idx] = value  
S1

It looks like #33643 changed the semantics of setitem operation when assigned values are pandas.Series, i.e. the old-behavior was to first align value with idx and then assign value items to S1 basing on idx.values as index labels, so result was:

>>> S1
0      0
1      1
2    -12
3      3
4    -10
5      5
6      6
7      7
8      8
9    -11
10   -13
dtype: int32

And now the result is as if S1[idx] took precedence before the actual assignment, so that labels in idx.values which are absent in value.index are assigned NaNs:

>>> S1
0      0.0
1      1.0
2    -12.0
3      3.0
4      NaN
5      5.0
6      6.0
7      7.0
8      8.0
9      NaN
10     NaN
dtype: float64

Can you please confirm this was intentional change (as I don't see any tests updated to reflect that, nor there are mentions in "What’s new in 1.1.0" section of the documentation)? From the user perspective it looks like with new implementation, user will have to ensure that value.index == idx.values to achieve the same result, which is not a big deal of course, but still it's some additional code.

First version where the issue appears:

INSTALLED VERSIONS
------------------
commit           : 4071c3b872e4ef98a2a2d7c752dd5d2030c4df6e
python           : 3.7.7.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.18362
machine          : AMD64
processor        : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 1.1.0.dev0+1307.g4071c3b87

Kind Regards,
Alexey.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-10-26T21:31:52Z

@kozlov-alexey Thanks a lot for the detailed report!

I think this wasn't an intentional change, at least it certainly seems so from the PR (as it was only meant as a code clean-up).

It's an unfortunate inconsistency between __getitem__ and loc, but not one we should just change like this I think, as this can easily (and silently) break your code.

cc @jbrockmendel

jbrockmendel · 2020-10-28T14:40:15Z

out fo curiosity, what does NBC stand for in this context?

kozlov-alexey · 2020-10-28T15:10:52Z

A non-backward compatible change. I thought it's well-spread but it turns it doesn't)
And thanks for quick reply!

simonjayhawkins · 2020-10-29T16:09:53Z

moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline

jorisvandenbossche · 2020-10-29T22:01:19Z

I put up a PR reverting part of the cleaning PR that caused this. If it's an acceptable fix, we can still merge by tomorrow morning: #37502

jorisvandenbossche · 2020-10-30T09:35:12Z

I opened a follow-up issue to actually discuss which behaviour we want long term (the non-alignment of __getitem__, or the alignment of .loc): #37516

kozlov-alexey · 2020-12-11T00:22:21Z

@jorisvandenbossche Hi, it seems the rollback in #37502 left few lines untouched, literally these:
https://github.com/jbrockmendel/pandas/blob/6a5c2429b140d7de5e1e04f6f5ed5d9351418848/pandas/core/series.py#L1034
https://github.com/jbrockmendel/pandas/blob/6a5c2429b140d7de5e1e04f6f5ed5d9351418848/pandas/core/series.py#L1077

and because of this same issue still appears with non-integer dtype keys. For example:

>>> pd.__version__
'1.2.0.dev0+1512.gec8240aa0'
>>> S = pd.Series(np.arange(11), index=['ab', 'ac', 'ad', 'a1', 'a2', 'a3', 'a ', 'ba', 'bc', 'bd', 'b1'])
>>> idx = pd.Series(['a2', 'bd', 'ad', 'b1'])
>>> set_value = pd.Series([-10, -11, -12, -13])
>>> S[idx] = set_value
>>> S
ab    0.0
ac    1.0
ad    NaN
a1    3.0
a2    NaN
a3    5.0
a     6.0
ba    7.0
bc    8.0
bd    NaN
b1    NaN
dtype: float64

Although there's ongoing discussion on what behavior is correct, can you maybe fully revert back to previous behavior for the time being?

Best Regards,
Alexey.

jbrockmendel · 2020-12-11T03:53:23Z

any reason not to do S[idx] = np.array(set_value)?

kozlov-alexey · 2020-12-11T12:50:50Z

@jbrockmendel That's possible of course, but it is still a workaround, and currently it will work with integer keys one way (i.e. old behavior of getitem) and with the rest of keys other way (new behavior of loc). Isn't it better to rollback back to old behavior for all cases?

kozlov-alexey added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 26, 2020

jorisvandenbossche added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 26, 2020

jorisvandenbossche added this to the 1.1.4 milestone Oct 26, 2020

simonjayhawkins modified the milestones: 1.1.4, 1.1.5 Oct 29, 2020

jorisvandenbossche mentioned this issue Oct 29, 2020

REGR: revert behaviour of getitem with assigning with a Series #37502

Merged

jorisvandenbossche modified the milestones: 1.1.5, 1.1.4 Oct 29, 2020

jorisvandenbossche closed this as completed in #37502 Oct 30, 2020

jorisvandenbossche mentioned this issue Oct 30, 2020

API: alignment behaviour of Series as assignment value in indexing #37516

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: NBC change in reindexing logic of Series.setitem in pandas=1.1.0 #37427

BUG: NBC change in reindexing logic of Series.setitem in pandas=1.1.0 #37427

kozlov-alexey commented Oct 26, 2020

jorisvandenbossche commented Oct 26, 2020 •

edited

Loading

jbrockmendel commented Oct 28, 2020

kozlov-alexey commented Oct 28, 2020 •

edited

Loading

simonjayhawkins commented Oct 29, 2020

jorisvandenbossche commented Oct 29, 2020

jorisvandenbossche commented Oct 30, 2020

kozlov-alexey commented Dec 11, 2020

jbrockmendel commented Dec 11, 2020

kozlov-alexey commented Dec 11, 2020 •

edited

Loading

BUG: NBC change in reindexing logic of Series.__setitem__ in pandas=1.1.0 #37427

BUG: NBC change in reindexing logic of Series.__setitem__ in pandas=1.1.0 #37427

Comments

kozlov-alexey commented Oct 26, 2020

jorisvandenbossche commented Oct 26, 2020 • edited Loading

jbrockmendel commented Oct 28, 2020

kozlov-alexey commented Oct 28, 2020 • edited Loading

simonjayhawkins commented Oct 29, 2020

jorisvandenbossche commented Oct 29, 2020

jorisvandenbossche commented Oct 30, 2020

kozlov-alexey commented Dec 11, 2020

jbrockmendel commented Dec 11, 2020

kozlov-alexey commented Dec 11, 2020 • edited Loading

BUG: NBC change in reindexing logic of Series.setitem in pandas=1.1.0 #37427

BUG: NBC change in reindexing logic of Series.setitem in pandas=1.1.0 #37427

jorisvandenbossche commented Oct 26, 2020 •

edited

Loading

kozlov-alexey commented Oct 28, 2020 •

edited

Loading

kozlov-alexey commented Dec 11, 2020 •

edited

Loading