Skip to content

BUG: .values upcasts dtype in non-mixed-dtype subset of mixed-dtype DataFrames #37346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
twoertwein opened this issue Oct 22, 2020 · 1 comment
Open
3 tasks done
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@twoertwein
Copy link
Member

twoertwein commented Oct 22, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

data = pd.DataFrame({"a": [1], "b": [0.1]})


print(type(data.loc[0, "a"]))  # <class 'numpy.int64'>
print(data.loc[0, ["a"]].values.dtype)  # float64

Problem description

.values seems to upcast dtype to a common dtype even when only a subset of columns (with the same dtype) are exported?

Expected Output

int64 for the second print.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.13.0-36-generic
Version : #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.1
setuptools : 49.2.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.3
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@twoertwein twoertwein added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 22, 2020
@twoertwein twoertwein changed the title BUG: .values upcasts dtype in mixed-dtype DataFrames BUG: .values upcasts dtype in non-mixed subset of mixed-dtype DataFrames Oct 22, 2020
@twoertwein twoertwein changed the title BUG: .values upcasts dtype in non-mixed subset of mixed-dtype DataFrames BUG: .values upcasts dtype in non-mixed-dtype subset of mixed-dtype DataFrames Oct 22, 2020
@jorisvandenbossche
Copy link
Member

@twoertwein Thanks for the report. What is upcasting to float here, is actually the selection of a single row as a Series (you are not creating a subset DataFrame, but a Series).

Using your example data:

In [8]: data.loc[0]   
Out[8]: 
a    1.0
b    0.1
Name: 0, dtype: float64

In [9]: data.loc[0, ['a']] 
Out[9]: 
a    1.0
Name: 0, dtype: float64

So for the first case, it is expected that we get float, since it needs to put the values of mixed columns in a single homogenous Series.
For the second case, though, we should in theory be able to preserve the integer dtype, given we are only selecting certain columns to create the Series. This doesn't seem to happen at the moment.

@jorisvandenbossche jorisvandenbossche added Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 27, 2020
@mroeschke mroeschke added the Bug label Aug 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

3 participants