Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
API: allow nan-likes in StringArray constructor #41412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: allow nan-likes in StringArray constructor #41412
Changes from all commits
3e1784d
96ff1da
418e1d2
25a6c4d
47d68f7
8257dbd
922436a
2f28086
3ee2198
9426a52
3ee55f2
fe4981a
a66948a
e852719
91b73bb
42ec3e4
41f49d2
62cc5be
29909f3
153b6b4
b27a839
ed5b953
caa5705
2d75031
52a00d1
8dc0b66
1bacaed
66be087
7b058cd
1be1bdf
3c57094
03738a9
12351de
889829a
358000f
c649b1d
5e5aa9c
eb7d8f2
2426319
20817a7
33d8f9a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this going through ensure_string_array instead of e.g. is_string_array? For the latter, the ravel would be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_string_array will not convert the other nans(None, np.nan, etc.) to the correct na_value of pd.NA. FWIW, switching to ensure_string_array will also align us with _from_sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does any of this become simpler if done in conjunction with the edits in #45057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to split this one up.
I'm thinking of maybe sticking to is_string_array and then doing another pass over the data to convert the not pd.NA nans, as a quick short term fix to unblock you. This will probably give a perf regression, but since is_string_array got sped up in #44495 on master(not 1.3.x), all I have to do is make the perf regression less than the perf improvement there, so that the regression is not user visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this is broken for 2D EAs even with the ravel.
@jbrockmendel Do you know how I can create a 2D StringArray to test this? Would the correct way to iterate over a 2D EA be using the
PyArray_GETITEM
andPyArray_ITER_DATA
combo like theValidator
in libs does? Then I would replace values byPyArray_SETITEM
ing on the array?(Not familiar at all with Numpy C API, so going to need some help here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.array(["foo", "bar"] * 5).reshape(5, 2)
Not sure I understand the question. The 2D EA has a working
__iter__
method, but you shouldn't be passing any EA to any of the cython functions.I'm still getting the hang of this (intending to make array_to_timedelta64 handle 2D directly to avoid ravels), would advise NOT trying to handle it in the cython code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is
coerce: bool
enough here?this is like
errors='coerce'
forcoerce=True
anderrors='raise'
forcoerce=False
, i guess 'ignore' would be meaningless.but I still think the
errors=
keyword is better for flexiblity.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will take a look soon-ish. I'm wary of adding keywords here