-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PyArrow StringDtype / StringArray fallback policy #42613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
maybe also consider a nopython=True/False setting (pandas option, dtype constructor argument and/or array attribute) |
Yep, those would be worth exploring. Global options are a bit harder for dask, since it's a bit hard to propagate global options to multiple processes (it'd be easier if we had a config system like Dask's). A per- StringArray / StringDtype instance option might also be useful, though we'd need to propagate that setting through operations which sounds a bit hard. |
I think the nicest behavior for the users would be for operations to "just work" with a fallback method, so I would be partial to raise a The easiest thing for us to do is probably raise telling user to convert, but one inconvenience for users is that they to revert their conversions to So I think for users
is a better experience than
|
In #35169 / #42597 we discussed the desired behavior of PyArrow-backed StringArray when a certain method is not implemented in
pyarrow.compute
.For string methods like
str_normalize
, which aren't currently implemented inpyarrow.compute
, I believe we (silently) cast fromPyarrow[string]
to an object-dtype ndarray of Pythonstr
objects atpandas/pandas/core/arrays/string_.py
Lines 527 to 528 in edd5af7
These kinds of performance cliffs are difficult for users to debug. I don't think we should do that conversion on behalf of the user. If something isn't implemented yet, then I think we should raise with a message saying they should convert to
string[python]
dtype first.If we don't want to raise, we could emit a
PerformanceWarning
, similar to what we do for SparseArray when converting to dense.The text was updated successfully, but these errors were encountered: