-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Categorical.empty #40602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Categorical.empty #40602
Conversation
@jorisvandenbossche looks like |
But it's not failing in CI on master? (but it's a bit strange how this PR would impact that test ..) |
IIUC that file isnt enabled for the ArrayManager tests, so it isnt such a problem that the test fails locally on master, mainly a mystery as to why it was run on this particular CI build? |
Ah, I think that's the same pytest mystery bug that I saw with the pytables tests. It's sometimes not skipping the first test of a file when using a marker for a full directory (in the |
#39612 will also solve that pytest-mark issue |
On the actual PR: I very much would like to see such functionality. But:
|
I agree, but don't have any bright ideas for a general-case implementation. If it has obj._can_hold_na we can use
I thought about this, but that leaves out DatetimeArray[naive] and TimedeltaArray. The workarounds for that are't that bad though, so id be open to this. (in general i think many EA methods would make more sense as EADtype methods, xref #40574) |
Tried this and got failures with ArrowBoolArray and StringArray |
on making this a general purpose method (in this PR), is this possible? I think this is ok on the array itself, this follows numpy convention. |
Numpy doesn't have an
Ah, yes. Now, since those are not proper EAs, they IMO shouldn't direct the EA design, so if the workarounds are doable, I would personally still prefer it as a method on the dtype. |
You can override the base one with custom implementations for those cases where it doesn't work. |
Fair enough. I'll give it a go. |
@jorisvandenbossche getting the ArrowExtensionArray test working is going to require making its _from_sequence not-ignore the dtype arg. can i get your help with that? (im fine xfailing that for now) |
Will take a look tomorrow |
I might be missing something (I didn't run actual code / the tests), but you can just pass through the dtype, and handle it (cast if specified)? The only thing that's missing is a mapping of the ExtensionDtype to the arrow type: --- a/pandas/tests/extension/arrow/arrays.py
+++ b/pandas/tests/extension/arrow/arrays.py
@@ -35,6 +35,7 @@ class ArrowBoolDtype(ExtensionDtype):
kind = "b"
name = "arrow_bool"
na_value = pa.NULL
+ arrow_type = pa.bool_()
@classmethod
def construct_array_type(cls) -> type_t[ArrowBoolArray]:
@@ -59,6 +60,7 @@ class ArrowStringDtype(ExtensionDtype):
kind = "U"
name = "arrow_string"
na_value = pa.NULL
+ arrow_type = pa.string()
@classmethod
def construct_array_type(cls) -> type_t[ArrowStringArray]:
@@ -76,8 +78,10 @@ class ArrowExtensionArray(OpsMixin, ExtensionArray):
_data: pa.ChunkedArray
@classmethod
- def from_scalars(cls, values):
+ def from_scalars(cls, values, dtype=None):
arr = pa.chunked_array([pa.array(np.asarray(values))])
+ if dtype is not None:
+ arr = arr.cast(dtype.arrow_type)
return cls(arr)
@classmethod
@@ -87,7 +91,7 @@ class ArrowExtensionArray(OpsMixin, ExtensionArray):
@classmethod
def _from_sequence(cls, scalars, dtype=None, copy=False):
- return cls.from_scalars(scalars)
+ return cls.from_scalars(scalars, dtype=dtype)
def __repr__(self):
return f"{type(self).__name__}({repr(self._data)})" |
To recap from above:
|
@jorisvandenbossche implemented the arrow suggestions, got:
|
updated to change .empty -> ._empty |
how does this help when this is a private method? |
it doesn't until we decide on somewhere we want to expose it. I'm thinking we put an |
ok so this doesn't address the original issue (e.g. categorical is not using this yet)? |
correct |
thanks @jbrockmendel |
xref #39776
xref dask/fastparquet#576 (comment)