-
Notifications
You must be signed in to change notification settings - Fork 3.7k
PyArrow gives ArrowTypeError serializing Pandas nullable Int64 #4168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jkleint This is a |
@jorisvandenbossche what is pandas's expected memory layout for the new integer array types? The 0.23 -> 0.24 shift will present a bit of a compatibility headache (we'll need a flag whether to produce the new memory layout if the user has a new enough pandas) |
The new integers are stored as pure numpy array for the values and a boolean mask array:
But, I am not sure it is up to pyarrow to add functionality to convert those (although you could argue to make an exception for it for the extension arrays added to pandas itself). |
Can we open a JIRA issue about this and close this issue? |
I think this is covered by the existing issues https://issues.apache.org/jira/browse/ARROW-5271 and https://issues.apache.org/jira/browse/ARROW-2428, which cover the general ExtensionArray topic. Or would you prefer to have a specific issue for nullable integers (that would be blocked by those issues)? |
Yeah, it would be nice to have an issue specifically about nullable integers to make sure it gets done (it's easy for such a thing to fall through the cracks) |
With the new Pandas 0.24 nullable integer types, Pandas
.to_parquet()
gives anArrowTypeError
. Not sure if this is a Pandas or PyArrow issue. This is with Python 3.7.2 and pyarrow 0.13.The text was updated successfully, but these errors were encountered: