-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: support multiple indexers for .iloc with a MultiIndex #7490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you need to use a tuple (0,0) |
As I'm sure you know, Python's In any case, |
indexing interpretation is amazing complex think about what df.loc[0,0] could possibly be so u guess this is technically a bug in that iloc for Series should also try to index into a multi index when passed multiple indexers why don't u give it a shot to fix |
It doesn't matter if you use a tuple or not (for a series!). With loc this works But I think more important, what should it mean? You could say, if you are thinking of integer locations based on the whole dataframe, there is only one first row? |
By the way, if you do this with a dataframe with
So for a dataframe |
so I wonder then should iloc as the ordering of a MultiIndex is only guaranteed when sorted |
But with a dataframe, a tuple is interpreted as a list (as in However, that is a minor point, the main thing is that I think multi-indexing with |
Hmm, this seems like it reports a correct error, @shoyer are you seeing something different?
|
@jreback I definitely do see the same error, and @jorisvandenbossche This is an interesting point about nested tuple indexing on a DataFrame invoking fancy indexing. That is indeed consistent with how numpy does things. So it looks like we could not add this without breaking some user code, although I do think it is rather unusual to use tuples (instead of lists or arrays) for indexers along a dimension, given how it doesn't work for 1D. I would be OK breaking the current nested tuple indexing, but that is definitely a design trade-off. (Note that Let me try to reproduce your pathological case (in a series, for simplicity):
|
@shoyer I don't follow your example I think. Can you explain why you think |
related is #5420 |
OK, here's a prototype of my proposed functionality: import numpy as np
import pandas as pd
def get_iloc(index, indexer):
int_levels = [np.arange(len(level)) for level in index.levels]
return pd.MultiIndex(int_levels, index.labels).get_loc(indexer)
def iloc(series, indexer):
return series.iloc[get_iloc(series.index, indexer)] And some code to delve into these issues; idx = pd.MultiIndex([[0, 1], [0, 1]], [[0, 0, 1, 1], [0, 1, 1, 0]])
s = pd.Series(np.arange(4), idx, name='s')
idx2 = pd.MultiIndex([[1, 0], [1, 0]], [[1, 1, 0, 0], [1, 0, 0, 1]])
s2 = pd.Series(np.arange(4), idx2, name='s2')
data = [(i, j, s.loc[(i, j)], s2.loc[(i, j)],
iloc(s, (i, j)), iloc(s2, (i, j)))
for i in range(2) for j in range(2)]
results = pd.DataFrame.from_records(
data, columns=['i', 'j', 'loc', 'loc2', 'iloc', 'iloc2']
).set_index(['i', 'j'])
So yes, as you can see, this proposal for iloc gives inconsistent results if the multi-index is not lexsorted -- but otherwise gives results that are fully consistent with I'm not sure it's possible to define this sort of indexing unambiguously without lexsorting, but again, that is a mostly standard constraint of MultiIndex. |
@shoyer how is this useful? we already have many types of indexing, and it is a struggle to keep everything consistent now. |
Now we've thought through the full implications of how this could work, I'm no longer convinced this is a good idea. Reasoning for non-lexsorted indexes is pretty convoluted, and I support |
MultIndexing with multiple indexers (#6301) via
.loc
is great.It would be nice to mirror this functionality with
.iloc
.To my understanding, until this change,
loc
andiloc
had a mirror syntax, where if you replaced all of your index labels with arrays of 0-indexed integers, they were equivalent, e.g., for the following series:Now they lack this symmetry, because indexing like
s.iloc[0, 0]
doesn't work likes.loc[0, 0]
. I found this surprising. Thoughts?The text was updated successfully, but these errors were encountered: