API: support multiple indexers for .iloc with a MultiIndex #7490

shoyer · 2014-06-18T05:46:47Z

MultIndexing with multiple indexers (#6301) via .loc is great.

It would be nice to mirror this functionality with .iloc.

To my understanding, until this change, loc and iloc had a mirror syntax, where if you replaced all of your index labels with arrays of 0-indexed integers, they were equivalent, e.g., for the following series:

import pandas as pd
midx = pd.MultiIndex.from_product([range(3), range(5)])
s = pd.Series(range(15), midx)

Now they lack this symmetry, because indexing like s.iloc[0, 0] doesn't work like s.loc[0, 0]. I found this surprising. Thoughts?

The text was updated successfully, but these errors were encountered:

jreback · 2014-06-18T09:36:10Z

you need to use a tuple (0,0)
in fact NOT using a tuple though it works it a bit wonky to support

shoyer · 2014-06-18T09:46:46Z

As I'm sure you know, Python's __getitem__ syntax makes no distinction between x[(0, 0)] and x[0, 0], though I agree that the former makes the intent here clearer.

In any case, s.iloc[(0, 0)] should work.

jreback · 2014-06-18T09:51:32Z

indexing interpretation is amazing complex

think about what

df.loc[0,0] could possibly be

so u guess this is technically a bug in that iloc for Series should also try to index into a multi index when passed multiple indexers

why don't u give it a shot to fix

jorisvandenbossche · 2014-06-18T10:00:05Z

It doesn't matter if you use a tuple or not (for a series!). With loc this works s.loc[0,0] just as expected. (but of course, for a dataframe this is something else, there you would need df.loc[(0,0),:]

But I think more important, what should it mean? You could say, if you are thinking of integer locations based on the whole dataframe, there is only one first row?
What would s.iloc[(0, 1)] mean? First location in level 0, but second location in level 1? That does contradicts a bit no? I think you are actually thinking "the first location of the values where the label in level 0 is equal to the first occurring label in that level" (something like s.groupby(level=0).get_group(0).iloc[0]) But what if the index labels are not sorted? What should it then mean?

jorisvandenbossche · 2014-06-18T10:04:13Z

By the way, if you do this with a dataframe with iloc, it interpretes the tuple as a list of two integer locations:

In [109]: s.to_frame().iloc[(0,0),:]
Out[109]: 
     0
0 0  0
  0  0

So for a dataframe df.iloc[(0,0),:] and df.iloc[[0,0],:] is equivalent, so you could argue that s.iloc[(0,0)] (and thus s.iloc[0,0]) should also do the same as s.iloc[[0,0]] and return two rows.

jreback · 2014-06-18T10:07:27Z

so I wonder then should iloc
with a multi index and a tuple be an error
as u probably mean a list (and if so then it should be specified as a list and not interpreted that way)

as the ordering of a MultiIndex is only guaranteed when sorted

jorisvandenbossche · 2014-06-18T11:09:30Z

But with a dataframe, a tuple is interpreted as a list (as in df.iloc[(0,1),:]), shouldn't this be the same with a series?

However, that is a minor point, the main thing is that I think multi-indexing with iloc does not make sense (seeing the location as 'flat' even if you have multiple levels of labels). Or does it?

jreback · 2014-06-18T12:15:50Z

Hmm, this seems like it reports a correct error, @shoyer are you seeing something different?

In [1]: midx = pd.MultiIndex.from_product([range(3), range(5)])

In [2]: s = pd.Series(range(15), midx)

In [3]: s
Out[3]: 
0  0     0
   1     1
   2     2
   3     3
   4     4
1  0     5
   1     6
   2     7
   3     8
   4     9
2  0    10
   1    11
   2    12
   3    13
   4    14
dtype: int64

In [6]: s.loc[0,1]
Out[6]: 1

In [7]: s.loc[(0,1)]
Out[7]: 1

In [8]: s.iloc[(0,0)]
IndexingError: Too many indexers

In [9]: s.iloc[0,0]
IndexingError: Too many indexers

shoyer · 2014-06-18T17:24:32Z

@jreback I definitely do see the same error, and IndexingError: Too many indexers seems like the right error (if this is actually prohibited).

@jorisvandenbossche This is an interesting point about nested tuple indexing on a DataFrame invoking fancy indexing. That is indeed consistent with how numpy does things.

So it looks like we could not add this without breaking some user code, although I do think it is rather unusual to use tuples (instead of lists or arrays) for indexers along a dimension, given how it doesn't work for 1D. I would be OK breaking the current nested tuple indexing, but that is definitely a design trade-off. (Note that .loc is already different from numpy indexing in some cases, for example if you do fancy indexing in multiple dimensions at once.)

Let me try to reproduce your pathological case (in a series, for simplicity):

>>> idx = pd.MultiIndex([['a', 'b'], [2, 1]], [[0, 0, 1, 1], [0, 1, 1, 0]])
>>> idx
MultiIndex(levels=[[u'a', u'b'], [2, 1]],
           labels=[[0, 0, 1, 1], [0, 1, 1, 0]])
>>> s = pd.Series(np.arange(4), idx)
>>> s
a  2    0
   1    1
b  1    2
   2    3
dtype: int64

.iloc should use the MultiIndex labels, which would mean s.iloc[(1, 0)] == 3 (not 2). I do agree this is somewhat counter-intuitive if the levels aren't sorted, but this is an unsupported corner case: .iloc already contains a warning about sorted labels:
Warning You will need to make sure that the selection axes are fully lexsorted!

jorisvandenbossche · 2014-06-18T18:49:05Z

@shoyer I don't follow your example I think. Can you explain why you think s.iloc[(1,0)] should be 3 and not 2?
And when do you get that warning when using iloc?

jreback · 2014-06-18T22:46:25Z

related is #5420

shoyer · 2014-06-19T07:24:18Z

OK, here's a prototype of my proposed functionality:

import numpy as np
import pandas as pd

def get_iloc(index, indexer):
    int_levels = [np.arange(len(level)) for level in index.levels]
    return pd.MultiIndex(int_levels, index.labels).get_loc(indexer)

def iloc(series, indexer):
    return series.iloc[get_iloc(series.index, indexer)]

And some code to delve into these issues;

idx = pd.MultiIndex([[0, 1], [0, 1]], [[0, 0, 1, 1], [0, 1, 1, 0]])
s = pd.Series(np.arange(4), idx, name='s')

idx2 = pd.MultiIndex([[1, 0], [1, 0]], [[1, 1, 0, 0], [1, 0, 0, 1]])
s2 = pd.Series(np.arange(4), idx2, name='s2')

data = [(i, j, s.loc[(i, j)], s2.loc[(i, j)],
         iloc(s, (i, j)), iloc(s2, (i, j)))
        for i in range(2) for j in range(2)]
results = pd.DataFrame.from_records(
    data, columns=['i', 'j', 'loc', 'loc2', 'iloc', 'iloc2']
    ).set_index(['i', 'j'])

>>> print s
0  0    0
   1    1
1  1    2
   0    3
Name: s, dtype: int64
>>> print s2
0  0    0
   1    1
1  1    2
   0    3
Name: s2, dtype: int64
>>> print results
     loc  loc2  iloc  iloc2
i j                        
0 0    0     0     0      2
  1    1     1     1      3
1 0    3     3     3      1
  1    2     2     2      0

So yes, as you can see, this proposal for iloc gives inconsistent results if the multi-index is not lexsorted -- but otherwise gives results that are fully consistent with loc for integer multi-indexes.

I'm not sure it's possible to define this sort of indexing unambiguously without lexsorting, but again, that is a mostly standard constraint of MultiIndex.

jreback · 2014-06-19T09:51:53Z

@shoyer how is this useful? we already have many types of indexing, and it is a struggle to keep everything consistent now.

shoyer · 2014-06-23T06:16:03Z

Now we've thought through the full implications of how this could work, I'm no longer convinced this is a good idea. Reasoning for non-lexsorted indexes is pretty convoluted, and I support .loc being as ndarray-like as possible.

jreback closed this as completed Jun 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: support multiple indexers for .iloc with a MultiIndex #7490

API: support multiple indexers for .iloc with a MultiIndex #7490

shoyer commented Jun 18, 2014

jreback commented Jun 18, 2014

shoyer commented Jun 18, 2014

jreback commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jreback commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jreback commented Jun 18, 2014

shoyer commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jreback commented Jun 18, 2014

shoyer commented Jun 19, 2014

jreback commented Jun 19, 2014

shoyer commented Jun 23, 2014

API: support multiple indexers for .iloc with a MultiIndex #7490

API: support multiple indexers for .iloc with a MultiIndex #7490

Comments

shoyer commented Jun 18, 2014

jreback commented Jun 18, 2014

shoyer commented Jun 18, 2014

jreback commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jreback commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jreback commented Jun 18, 2014

shoyer commented Jun 18, 2014

jorisvandenbossche commented Jun 18, 2014

jreback commented Jun 18, 2014

shoyer commented Jun 19, 2014

jreback commented Jun 19, 2014

shoyer commented Jun 23, 2014