-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG/TST: fix arrow roundtrip / parquet tests for recent pyarrow #30077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/TST: fix arrow roundtrip / parquet tests for recent pyarrow #30077
Conversation
results = [] | ||
for arr in chunks: | ||
# TODO should optimize this without going through object array | ||
bool_arr = BooleanArray._from_sequence(np.array(arr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool_arr = BooleanArray._from_sequence(np.array(arr)) | |
bool_arr = BooleanArray._from_sequence(np.asarray(arr)) |
No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way that a conversion from a pyarrow boolean array (which uses a bitmask) to a numpy array can be without a copy, so it shouldn't matter I think
chunks = array.chunks | ||
|
||
results = [] | ||
for arr in chunks: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u write this method generically and put on the base class or Arrow mxin class
as it already looks like it would work for any extension type (except the final use of BooleanArray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IntegerDtype implementation is different, though. The implementation here is indeed more or less the same as for the StringArray, but once we we fix the mentioned TODO (to avoid going through object dtype), the Boolean one will also be custom.
The StringDtype one could still be put in a base mixin (it needs to be a mixin, and not directly in the base ExtensionDtype class, as for pyarrow the presence or absence of this method is relevant), but no one else would be using it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, yeah we want avoid repeating this code as much as possible, so having ageneric (but working) impl would be good, and using helper / properties to ease the burden on each dtype would also be great. sure this could be done later as well.
@@ -101,6 +101,24 @@ def __repr__(self) -> str: | |||
def _is_boolean(self) -> bool: | |||
return True | |||
|
|||
def __from_arrow__(self, array): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be annotated? or at least have types in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type is mentioned in the docstring (just not in a parameters section, but can turn it into a more fully fledged docstring). No one except for pyarrow should be calling this method though.
Question for annotating: how does it work to annotate it with pyarrow objects that cannot necessarily be imported? (since it's an optional dependency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea about the annotation?
couple of questions, none blockers, LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. add types if you can; ideally followup to move this code generically to base class of EA
Going to merge this, as it fixes the tests. Will revisit extracting base functionality / utility function in the period/interval PR. |
Closes #29976