Skip to content

BUG: "A value is trying to be set on a copy of a slice from a DataFrame" even if I set something on a copy dataframe #45513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
GF-Huang opened this issue Jan 20, 2022 · 12 comments · Fixed by #56614

Comments

@GF-Huang
Copy link

GF-Huang commented Jan 20, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

I can't, it's complex.

Issue Description

I already copy, but it still say A value is trying to be set on a copy of a slice from a DataFrame.

image

image

image

Expected Behavior

No warning if I copy the df then set something .

Installed Versions

INSTALLED VERSIONS


commit : 66e3805
python : 3.9.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : Chinese (Simplified)_China.936

pandas : 1.3.5
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 57.4.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : 1.0.2
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.25.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None
numba : None

@GF-Huang GF-Huang added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 20, 2022
@phofl
Copy link
Member

phofl commented Jan 20, 2022

Please provide something reproducible, otherwise we are not able to investigate. Your screenshots are not sufficient

@phofl phofl added the Needs Info Clarification about behavior needed to assess issue label Jan 20, 2022
@mroeschke mroeschke removed the Needs Triage Issue that has not been reviewed by a pandas team member label Jan 20, 2022
@phofl
Copy link
Member

phofl commented Jan 24, 2022

Please provide an example that is copy and pastable.

@GF-Huang
Copy link
Author

Data file: data.zip

import pandas as pd

df = pd.read_csv('data.zip', compression='zip')
df.sort_values(['TimeBarStart', 'Strike', 'CallPut'], inplace=True)

open_price = (df.iloc[0].UnderOpenBidPrice + df.iloc[0].UnderOpenAskPrice) / 2

df = df[df.Strike == df.loc[(df.Strike - open_price).abs().idxmin()].Strike]
df.set_index('TimeBarStart', inplace=True)
df.index.rename('Time', inplace=True)
call, put = df[df.CallPut == 'C'][:'15:59'], df[df.CallPut == 'P'][:'15:59']
straddle = (call[['OpenBidPrice', 'HighBidPrice', 'LowBidPrice', 'CloseBidPrice', 'Volume']] + 
            put[['OpenBidPrice', 'HighBidPrice', 'LowBidPrice', 'CloseBidPrice', 'Volume']])
straddle['UnderOpen'] = ((call.UnderOpenBidPrice + call.UnderOpenAskPrice) / 2 + \
                          (put.UnderOpenBidPrice + put.UnderOpenAskPrice) / 2) / 2
straddle['UnderClose'] = ((call.UnderCloseBidPrice + call.UnderCloseAskPrice) / 2 + \
                          (put.UnderCloseBidPrice + put.UnderCloseAskPrice) / 2) / 2
call_spread = call.CloseAskPrice[0] - call.CloseBidPrice[0]
put_spread = put.CloseAskPrice[0] - put.CloseBidPrice[0]
straddle = straddle.copy()
straddle.CloseBidPrice[0] = (call.CloseBidPrice[0] + call_spread * 0.5) + (put.CloseBidPrice[0] + put_spread * 0.5)
straddle.OpenBidPrice[1] = straddle.CloseBidPrice[0]

@phofl
Copy link
Member

phofl commented Jan 24, 2022

Is all this necessary? Can you remove Parts of this and it shows still the error? Please remove everything that is not necessary to reproduce the bug. Copy and pasteable means that it should not depend on files if possible. Viewing your screenshot, this should be possible.

@GF-Huang
Copy link
Author

GF-Huang commented Jan 24, 2022

The only differences is the Volume column.

image

import pandas as pd

df = pd.read_csv('data.zip', compression='zip')
df.set_index('TimeBarStart', inplace=True)
call, put = df[df.CallPut == 'C'], df[df.CallPut == 'P']
straddle = (call[['OpenBidPrice', 'CloseBidPrice', 'Volume']] + 
            put[['OpenBidPrice', 'CloseBidPrice', 'Volume']])
straddle = straddle.copy()
straddle.OpenBidPrice[1] = straddle.CloseBidPrice[0]

@phofl
Copy link
Member

phofl commented Jan 24, 2022

Thx, this is better. Can you create the dataframe in code instead of reading it from a file? Not more rows than strictly necessary please

@GF-Huang
Copy link
Author

GF-Huang commented Jan 24, 2022

I can't, you can try download my uploaded data.zip, it very large (4MB).

@phofl
Copy link
Member

phofl commented Jan 24, 2022

The probability that someone takes a closer look increases, if it is easier to replicate. Large dataframes are terrible to debug.

@GF-Huang
Copy link
Author

I'm sorry, I can't do any more.

@jorisvandenbossche
Copy link
Member

This is quite easily reproducible with a small example like this:

In [11]: df = pd.DataFrame({"a": [1, 2, 3], "b": [3, 4, 5], "c": [0.1, 0.2, 0.3]})

In [12]: df.a[0] = df.b[0]
SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.a[0] = df.b[0]

I suppose we could consider this as a false positive (modifying a column like this should I think in theory always work?), but you are still doing "chained assignment" (assigning something to a certain column and certain row label, but selecting column and row in separate, chained steps). Generally, the recommendation is to use .loc/.iloc instead to do the assignment in one step. For my small example, this is:

In [13]: df.loc[0, "a"] = df.b[0]

which doesn't raise the warning.

In your case, you have a string row index, so if you want to combine a positional row label with a named column label, you might need to do something like straddle.loc[straddle.index[1], "OpenBidPrice"] = ....

@jorisvandenbossche jorisvandenbossche added Copy / view semantics and removed Needs Info Clarification about behavior needed to assess issue Bug labels Jan 25, 2022
@GF-Huang
Copy link
Author

@jorisvandenbossche But why it only raise when include Volume column?

@AlfredoBB
Copy link

I happen to notice this bug as well.
The feedback I am getting from pandas, is this:

data.loc[:,columnA] = process_V_MA(ref=data.loc[:,columnA],value=data['V'])
C:\Users\alfre\OneDrive\Documentos\m21\nn.py:685: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

this doesn't make much sense; all the datafrems are being properly referred. this happens in the last version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants