-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
pd.concat with copy=True doesn't copy columns index #29879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Another example of this that hits a slightly different code path: In [1]: import pandas as pd; pd.__version__
Out[1]: '0.26.0.dev0+958.g545d17529'
In [2]: df = pd.Series(list('abc'), name='foo').to_frame()
In [3]: df2 = pd.Series(list('xyz'), name='foo').to_frame()
In [4]: df.columns is df2.columns # verifying not the same
Out[4]: False
In [5]: df3 = pd.concat([df, df2], copy=True)
In [6]: df.columns is df3.columns # shouldn't be the same but are
Out[6]: True
In [7]: df2.columns is df3.columns # this is okay
Out[7]: False |
this analysis is not correct. Indexes are immutable, thus they are shared, concat I would expect this
|
Further testing shows that this issue is more generic, not just df = pd.DataFrame({'a': [1.1], 'b': [2.2]})
df3 = pd.concat([df] * 2, copy=True, axis=1)
print(df3)
# a b a b
# 0 1.1 2.2 1.1 2.2
print(df3.index is df.index) # True
print(df3.columns is df.columns) # False |
We should share the underlying data, but not the actual Index object? (so make a new Index object from the same data) |
Code Sample
Problem description
It is expected that
copy=True
should make a copy of thecolumns
index. However, the demonstration shows it's the same object. Modifications, such as thename
property, are made to the source frame and the output frame created bypd.concat
.Expected Output
It is expected that
df2.columns
is a copy of the originaldf.columns
. This would prevent (e.g.) changing thename
property for both frames.A workaround is to manually copy this property after
pd.concat
:Output of
pd.show_versions()
pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.1.post20191125
Cython : 0.29.14
pytest : 5.3.0
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6
The text was updated successfully, but these errors were encountered: