-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: partial string indexing with scalar #27712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexing.py
Outdated
@@ -1704,6 +1704,11 @@ def _is_scalar_access(self, key: Tuple): | |||
if isinstance(ax, MultiIndex): | |||
return False | |||
|
|||
if isinstance(k, str) and ax.is_all_dates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is is_all_dates
the appropriate way to check whether partial string indexing is supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that it's potentially expensive. Why not just isinstance(ax, DatetimeIndex)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could alias is_all_dates
to _partial_str_indexing_supported
(or something less verbose) to make this more obvious?
For non-object Index is_all_dates is pretty cheap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the right place for this, rather you just call:
self._get_partial_string_timestamp_match_key(key, labels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would you do that? The problem with the current structure is that _LocIndexer.__getitem__
if self._is_scalar_access(key)
returns True, we go straight to return self._getitem_scalar(key)
, which is incorrect for partial string indexing.
It doesn't look like _get_partial_string_timestamp_match_key
will work, at least not directly. That doesn't convert the partial string key to a slice or array indexer.
(Pdb) pp key
('2000', 'A')
(Pdb) pp self._get_partial_string_timestamp_match_key(key[0], self.obj.axes[0])
'2000'
Stepping back, I think that _is_scalar_access
should return False for partial string indexing, since it's not actually a scalar access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could alias is_all_dates to _partial_str_indexing_supported (or something less verbose) to make this more obvious?
For non-object Index is_all_dates is pretty cheap
It's the object case that I'm worried about, as I think this always hit.
Rather than inferring, why don't we define whether an index type supports partial string indexing as a class attribute? That skips the (potentially) expensive inference step, which IIUC would only be useful if (object-type) Index
objects supported partial string indexing, which I don't believe they do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0fb9bbe has that, as a POC. All the indexing tests pass locally.
pandas/core/indexing.py
Outdated
@@ -1704,6 +1704,11 @@ def _is_scalar_access(self, key: Tuple): | |||
if isinstance(ax, MultiIndex): | |||
return False | |||
|
|||
if isinstance(k, str) and ax.is_all_dates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the right place for this, rather you just call:
self._get_partial_string_timestamp_match_key(key, labels)
thanks @TomAugspurger |
@jbrockmendel can you create an issue to see if we can remove .is_all_dates entirely? |
Closes #27516
cc @jreback if you have thoughts on this approach.