-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REGR: properly update DataFrame cache in Series.__setitem__ #48215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: properly update DataFrame cache in Series.__setitem__ #48215
Conversation
I am not fully sure how to exactly test #47172 (but I verified manually that it fixes the memory increase), but I did add a test that fails on main and 1.4 (and worked on pandas 1.3) and passes with this fix. |
Thanks for the explanation. makes sense.
looks fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorisvandenbossche lgtm pending green
doc/source/whatsnew/v1.4.4.rst
Outdated
@@ -26,6 +26,7 @@ Fixed regressions | |||
- Fixed regression in :meth:`DataFrame.loc` setting a length-1 array like value to a single value in the DataFrame (:issue:`46268`) | |||
- Fixed regression when slicing with :meth:`DataFrame.loc` with :class:`DateOffset`-index (:issue:`46671`) | |||
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`) | |||
- Fixed regression in updating a DataFrame column through :meth:`Series.__setitem__` (:issue:`47172`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Series.__setitem__
doesn't render? so perhaps reword and maybe should also mention memory leak or that the values were not updated inplace using chained indexing with assignment?
@@ -1235,3 +1235,21 @@ def test_setitem_not_operating_inplace(self, value, set_value, indexer): | |||
view = df[:] | |||
df[indexer] = set_value | |||
tm.assert_frame_equal(view, expected) | |||
|
|||
@td.skip_array_manager_invalid_test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this skips whereas skip_array_manager_not_yet_implemented
xfails.
Now this test fails with AttributeError: 'ArrayManager' object has no attribute 'blocks'
since we are explicitly testing the BlockManager values.
so it does make sense to just skip.
but are we likely to want a similar test for array manger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it indeed makes sense to skip this test instead of xfail, since the way is written is not intended to ever work for ArrayManager. But so we could indeed write a separate test that checks for the the ArrayManager case (but I would say it's maybe not a super high priority, I can only take a look in the weekend).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I would say it's maybe not a super high priority
no problem. not a blocker.
@meeseeksdev backport 1.5.x |
Potential fix for #47172
#43406 added an
inplace
keyword to_maybe_update_cacher
, but made the defaultFalse
. While before that PR, the effective behaviour would rather matchTrue
I think (that PR was meant to make some cases not inplace).However, if we are in the case of
Series.__setitem__
needing a cache update, so meaning that this series is a column of a DataFrame and we might need to update that DataFrame, I think we always should try update that DataFrame inplace (we are not in aDataFrame.__setitem__
case, but rather setting into a column we got withDataFrame.__getitem__
, so chained assignment, which currently still is expected to work in this case).