Skip to content

ENH: allow EA to register types for is_scalar #27462

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jbrockmendel opened this issue Jul 19, 2019 · 7 comments
Open

ENH: allow EA to register types for is_scalar #27462

jbrockmendel opened this issue Jul 19, 2019 · 7 comments
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@jbrockmendel
Copy link
Member

#27461 (comment)

i think we need. way for EA to hook into this for an EA scalar
eg an IPaddress from cyberpandas could register a scalar i think

Before we move on this, I think we need to clarify in which situations we care about lib.is_scalar(x) vs the simpler np.ndim(x) == 0

@TomAugspurger
Copy link
Contributor

One example is for nested data. In this case we need something like scalar_for_dtype(value, dtype), since the ndim of a "scalar" for a nested data type would be > 0.

@jbrockmendel jbrockmendel added API Design ExtensionArray Extending pandas with custom dtypes or arrays. labels Jul 23, 2019
@jorisvandenbossche
Copy link
Member

Alternative for registering, could be a method on the dtype/array that can check if a value is a valid scalar?

@sterlinm
Copy link

Hi! I think I've run into this issue in my own attempt at building an ExtensionArray and I was curious if there'd been any changes on this or if it was something I could potentially contribute on.

I've been working on an extension array where the na_value I want to return for the ExtensionDtype is not recognized as a scalar by is_scalar. That seems to cause issues with some methods that aren't part of the ExtensionArray interface that I can't figure out how to fix (e.g. Series.where).

Is there another workaround for this that I haven't found yet? Thanks!

@jbrockmendel
Copy link
Member Author

Is there another workaround for this that I haven't found yet?

Only thought that comes to mind is trying to replace is_scalar checks with not is_listlike checks. Last time I checked (worth double-checking since this was a while ago) is_listlike was faster than is_scalar anyway, and should be more robust to this problem.

@sterlinm
Copy link

Thanks very much! It looks like that change has already been made in a number of places in the most recent versions of Pandas (I was testing on 1.3).

Thanks for your help and sorry to bother you!

@andrewgsavage
Copy link

andrewgsavage commented Oct 2, 2022

Now that is_list_like interprets scalars correctly, #44626, this is now the main issue holding back pint-pandas.

There's a few different ways suggested in this issue since it was created. What's the suggested way to fix this at the moment?

edit: I was able to get all tests in pint-pandas passing without this, so it may not be needed.

@jbrockmendel
Copy link
Member Author

I looked at this in April and writing up my conclusions fell through the cracks.

Many of the places where we use is_scalar (also is_list_like) are either

  1. as a preliminary check if we can use this as a scalar in __setitem__
  2. to see whether we should treat it as a single label vs sequence of labels for indexing.

In the latter case, is_scalar is behaving like a faster is_hashable (58ns vs 506ns on []).

In the former, we should be able to use an EA-specific method to check if the item is a scalar that is valid for the specific array at hand. We already have something like this for most of our internal EAS (DTA, TDA, PeriodArray, Categorical, PandasArray, IntervalArray, and MaskedArray all have _validate_setitem_value. ArrowExtensionArray has _maybe_convert_setitem_value).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

6 participants