Skip to content

BUG: DataFrame.to_dict(orient="records") does not return native types with frame constructed from pyarrow Scalars #37642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
arw2019 opened this issue Nov 5, 2020 · 3 comments
Labels
Arrow pyarrow functionality Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@arw2019
Copy link
Member

arw2019 commented Nov 5, 2020

On 1.2 master:

In [15]: import pandas as pd
    ...: import pyarrow as pa
    ...: 
    ...: df = pd.DataFrame({'a': pa.scalar(7)}, index=['a'])
    ...: print(type(df.to_dict(orient="records")[0]['a']))
    ...: 
<class 'pyarrow.lib.Int64Scalar'>

This is inconsistent with the corresponding op for NumPy backed types (xref #37571)

In [13]: import pandas as pd
    ...: import numpy as np
    ...: 
    ...: df = pd.DataFrame({'a': np.int(7)}, index=['a'])
    ...: print(type(df.to_dict(orient="records")[0]['a']))
    ...: 
<class 'int'>

In general we'd like to_dict to return Python native types if possible.

@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member ExtensionArray Extending pandas with custom dtypes or arrays. Dtype Conversions Unexpected or buggy dtype conversions labels Nov 5, 2020
@jorisvandenbossche
Copy link
Member

In this specific case, the column has object dtype (storing the actual pyarrow scalar, we didn't coerce to an integer dtype). I think that for object dtype, we should simply return the element as stored in the array (in general we also don't have a way to know the "native" type for a generic python object that can be stored in object dtype arrays)

@arw2019
Copy link
Member Author

arw2019 commented Nov 5, 2020

In this specific case, the column has object dtype (storing the actual pyarrow scalar, we didn't coerce to an integer dtype). I think that for object dtype, we should simply return the element as stored in the array (in general we also don't have a way to know the "native" type for a generic python object that can be stored in object dtype arrays)

That makes sense. Would we revisit this for pyarrow-backed arrays?

@jorisvandenbossche
Copy link
Member

Those won't use object dtype (like eg the ArrowStringArray in the works), so what I said above about object dtype doesn't apply. And for most arrow types I think there is a clear "native" python type.

@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Nov 6, 2020
@mroeschke mroeschke added the Arrow pyarrow functionality label Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

4 participants