Skip to content

Allow level wildcard via slice(None) in df.ix[] with MultiIndex #2425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timcera opened this issue Dec 4, 2012 · 5 comments
Closed

Allow level wildcard via slice(None) in df.ix[] with MultiIndex #2425

timcera opened this issue Dec 4, 2012 · 5 comments
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Milestone

Comments

@timcera
Copy link
Contributor

timcera commented Dec 4, 2012

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = DataFrame(randn(3, 8), index=['A', 'B', 'C'], columns=index)
df = df.T

The following works:

df.A.ix[slice(None), 'two']

first
bar      0.233442
baz     -1.551486
foo      0.175177
qux      1.524440
Name: A

It is very possible that this is something about a MultiIndex DataFrame that I don't understand, but I think the following should work:

df.ix[slice(None), 'two']

Instead of the expected:

first
bar      0.233442      0.359638  2.061794
baz     -1.551486    -1.888951  0.172784
foo      0.175177     -1.085777 -0.988670
qux      1.524440     0.070535  1.006851

I get a KeyError:

...
/sjr/beodata/local/python_linux/lib/python2.6/site-packages/pandas/core/internals.pyc in _check_have(self, item)
949 def _check_have(self, item):
950 if item not in self.items:
--> 951 raise KeyError('no item named %s' % com.pprint_thing(item))
952
953 def reindex_axis(self, new_axis, method=None, axis=0, copy=True):

KeyError: u'no item named two'

@ghost
Copy link

ghost commented Dec 6, 2012

AFICT from the code, the args of ix correspond to the axes, so you're asking for all rows
and column "two". which fails.
case in point:

In [4]: df.ix[:]
Out[4]: 
                     A         B         C
first second                              
bar   one    -0.720545 -0.382630  0.573031
      two    -0.263034  0.462324 -0.126281
baz   one     1.676899 -0.660316  1.216486
      two    -0.343970  0.234571  0.347938
foo   one    -0.563490 -1.136923 -0.450143
      two    -1.209016  0.044605  0.879672
qux   one    -0.276785  0.563070 -0.133299
      two    -0.449211  0.545187 -0.869852

In [6]: df.ix[:,('A','B')]
Out[6]: 
                     A         B
first second                    
bar   one    -0.720545 -0.382630
      two    -0.263034  0.462324
baz   one     1.676899 -0.660316
      two    -0.343970  0.234571
foo   one    -0.563490 -1.136923
      two    -1.209016  0.044605
qux   one    -0.276785  0.563070
      two    -0.449211  0.545187

This also works

In [10]: df.ix[('foo','two'),:]
Out[10]: 
A   -1.209016
B    0.044605
C    0.879672
Name: (foo, two)

This however doesn't, which is disappointing:

In [7]: df.ix[(slice(None),'two')]
/home/user1/src/pandas/pandas/core/internals.pyc in _check_have(self, item)
   1002     def _check_have(self, item):
   1003         if item not in self.items:
-> 1004             raise KeyError('no item named %s' % com.pprint_thing(item))
   1005 
   1006     def reindex_axis(self, new_axis, method=None, axis=0, copy=True):

KeyError: u'no item named two'

and maybe that is a bug.

If your question comes from a real need,

In [7]: df.xs('two',level=1)
Out[7]: 
              A         B         C
first                              
bar   -0.677739 -0.740875  0.072675
baz    0.061356 -1.522032  1.084492
foo   -0.124634 -2.342294 -0.625460
qux   -0.647809 -0.051477  0.724003

will get you there, until this is otherwise addressed.

p.s. I think you may have mixed up columns and index in your example.

@timcera
Copy link
Contributor Author

timcera commented Dec 6, 2012

Fixed my example about the column/row confusion - forgot a command during copy/paste...

@timcera
Copy link
Contributor Author

timcera commented Dec 6, 2012

I have a 5 level MultiIndex DataFrame and want to select rows by wildcard one or more of the levels. I figured out a solution yesterday - outside of the pandas framework by using 'get_level_values' for each level to build a pseudo database. Works, and actually is fast enough.

When trying to use native pandas I was hoping that if 'df.ix' supported 'slice(None)' I could use it for my DataFrame as something like:

df.ix[('PERLND', 112, 'PWATER', slice(None), 2)]

Which would wildcard the fourth level - right? If that make sense, that is what I think I want.

@ghost
Copy link

ghost commented Dec 12, 2012

related #1766

@jreback jreback mentioned this issue Sep 21, 2013
@jreback
Copy link
Contributor

jreback commented Dec 18, 2013

closing in favor of #4036

@jreback jreback closed this as completed Dec 18, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

No branches or pull requests

2 participants