Skip to content

SettingWithCopyWarning while iterating on groupby results #19151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
has2k1 opened this issue Jan 9, 2018 · 3 comments
Closed

SettingWithCopyWarning while iterating on groupby results #19151

has2k1 opened this issue Jan 9, 2018 · 3 comments

Comments

@has2k1
Copy link
Contributor

has2k1 commented Jan 9, 2018

Code Sample

import pandas as pd

df = pd.DataFrame({
    'x': range(4),
    'c': list('aabb')
})

# These operations should have the same consequences.

# 1. No warning
gdfs = [gdf for _, gdf in df.groupby('c')]
for gdf in gdfs:
    gdf['x'] = 1

# 2. SettingWithCopyWarning
gdfs = []
for _, gdf in df.groupby('c'):
    gdf['x'] = 1
    gdfs.append(gdf)

Problem description

Modifying a group dataframe while iterating through the groups triggers a SettingWithCopyWarning.
The dataframe referred to is a result of an internal slicing operation and should not lead to a warning in userspace.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.1-gentoo
machine: x86_64
processor: Intel
byteorder: little
LC_ALL: en_US.utf8
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.23.0.dev0+81.g78147e9c8.dirty
pytest: 3.3.1
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

has2k1 added a commit to has2k1/pandas that referenced this issue Jan 9, 2018
The dataframe referred to in the warning is a result of an
internal slicing operation and should not lead to a warning
in userspace.

fixes pandas-dev#19151
@jreback
Copy link
Contributor

jreback commented Jan 9, 2018

slightly related to #5758

why do you think these should be able to mutate this object? this is a slice, sure its an implementation detail, but in general you would need to copy this. Mutation like this is plain non-idiomatic as well.

@jreback jreback closed this as completed Jan 9, 2018
@jreback jreback added this to the No action milestone Jan 9, 2018
@has2k1
Copy link
Contributor Author

has2k1 commented Jan 9, 2018

why do you think these should be able to mutate this object?

Mutation like this is plain non-idiomatic as well.

Okay, here is why.

I have custom split-apply-combine code, some of which nests multiple times. groupby().apply() is/was not an option (functions may have consequences or
to just prevent an explosion of speculative function calls on the first group due to nesting). Where possible I use generators and may reuse the group dataframes. For such a case it is more performant to avoid copying the returned group dataframes.

So other than the oddity about implementation details leaking out, it is nicer to do groupbys if the data is handed back with no strings attached.

@jreback
Copy link
Contributor

jreback commented Jan 9, 2018

@has2k1 you are using pandas non idiomatically
and on your own

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants