Skip to content

BUG: partial string indexing with scalar #27712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 4, 2019

Conversation

TomAugspurger
Copy link
Contributor

Closes #27516

cc @jreback if you have thoughts on this approach.

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 2, 2019
@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Aug 2, 2019
@@ -1704,6 +1704,11 @@ def _is_scalar_access(self, key: Tuple):
if isinstance(ax, MultiIndex):
return False

if isinstance(k, str) and ax.is_all_dates:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is_all_dates the appropriate way to check whether partial string indexing is supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that it's potentially expensive. Why not just isinstance(ax, DatetimeIndex)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could alias is_all_dates to _partial_str_indexing_supported (or something less verbose) to make this more obvious?

For non-object Index is_all_dates is pretty cheap

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the right place for this, rather you just call:

self._get_partial_string_timestamp_match_key(key, labels)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would you do that? The problem with the current structure is that _LocIndexer.__getitem__ if self._is_scalar_access(key) returns True, we go straight to return self._getitem_scalar(key), which is incorrect for partial string indexing.

It doesn't look like _get_partial_string_timestamp_match_key will work, at least not directly. That doesn't convert the partial string key to a slice or array indexer.

(Pdb) pp key
('2000', 'A')
(Pdb) pp self._get_partial_string_timestamp_match_key(key[0], self.obj.axes[0])
'2000'

Stepping back, I think that _is_scalar_access should return False for partial string indexing, since it's not actually a scalar access.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could alias is_all_dates to _partial_str_indexing_supported (or something less verbose) to make this more obvious?

For non-object Index is_all_dates is pretty cheap

It's the object case that I'm worried about, as I think this always hit.

Rather than inferring, why don't we define whether an index type supports partial string indexing as a class attribute? That skips the (potentially) expensive inference step, which IIUC would only be useful if (object-type) Index objects supported partial string indexing, which I don't believe they do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0fb9bbe has that, as a POC. All the indexing tests pass locally.

@@ -1704,6 +1704,11 @@ def _is_scalar_access(self, key: Tuple):
if isinstance(ax, MultiIndex):
return False

if isinstance(k, str) and ax.is_all_dates:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the right place for this, rather you just call:

self._get_partial_string_timestamp_match_key(key, labels)

@jreback jreback merged commit 2263982 into pandas-dev:master Aug 4, 2019
@jreback
Copy link
Contributor

jreback commented Aug 4, 2019

thanks @TomAugspurger

@jreback
Copy link
Contributor

jreback commented Aug 4, 2019

@jbrockmendel can you create an issue to see if we can remove .is_all_dates entirely?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Partial string indexing returns ndarray rather than Series.
3 participants