diff --git a/doc/source/10min.rst b/doc/source/10min.rst index 6b1bfdf7b241d..3a3b3d5e36977 100644 --- a/doc/source/10min.rst +++ b/doc/source/10min.rst @@ -166,7 +166,7 @@ Selection recommend the optimized pandas data access methods, ``.at``, ``.iat``, ``.loc``, ``.iloc`` and ``.ix``. -See the :ref:`Indexing section ` and below. +See the indexing documentation :ref:`Indexing and Selecing Data ` and :ref:`MultiIndex / Advanced Indexing ` Getting ~~~~~~~ @@ -529,7 +529,7 @@ the function. Reshaping --------- -See the sections on :ref:`Hierarchical Indexing ` and +See the sections on :ref:`Hierarchical Indexing ` and :ref:`Reshaping `. Stack diff --git a/doc/source/advanced.rst b/doc/source/advanced.rst new file mode 100644 index 0000000000000..1749409c863df --- /dev/null +++ b/doc/source/advanced.rst @@ -0,0 +1,709 @@ +.. _advanced: + +.. currentmodule:: pandas + +.. ipython:: python + :suppress: + + import numpy as np + import random + np.random.seed(123456) + from pandas import * + options.display.max_rows=15 + import pandas as pd + randn = np.random.randn + randint = np.random.randint + np.set_printoptions(precision=4, suppress=True) + from pandas.compat import range, zip + +****************************** +MultiIndex / Advanced Indexing +****************************** + +This section covers indexing with a ``MultiIndex`` and more advanced indexing features. + +See the :ref:`Indexing and Selecting Data ` for general indexing documentation. + +.. warning:: + + Whether a copy or a reference is returned for a setting operation, may + depend on the context. This is sometimes called ``chained assignment`` and + should be avoided. See :ref:`Returning a View versus Copy + ` + +.. warning:: + + In 0.15.0 ``Index`` has internally been refactored to no longer sub-class ``ndarray`` + but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This should be + a transparent change with only very limited API implications (See the :ref:`Internal Refactoring `) + +See the :ref:`cookbook` for some advanced strategies + +.. _advanced.hierarchical: + +Hierarchical indexing (MultiIndex) +---------------------------------- + +Hierarchical / Multi-level indexing is very exciting as it opens the door to some +quite sophisticated data analysis and manipulation, especially for working with +higher dimensional data. In essence, it enables you to store and manipulate +data with an arbitrary number of dimensions in lower dimensional data +structures like Series (1d) and DataFrame (2d). + +In this section, we will show what exactly we mean by "hierarchical" indexing +and how it integrates with the all of the pandas indexing functionality +described above and in prior sections. Later, when discussing :ref:`group by +` and :ref:`pivoting and reshaping data `, we'll show +non-trivial applications to illustrate how it aids in structuring data for +analysis. + +See the :ref:`cookbook` for some advanced strategies + +Creating a MultiIndex (hierarchical index) object +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``MultiIndex`` object is the hierarchical analogue of the standard +``Index`` object which typically stores the axis labels in pandas objects. You +can think of ``MultiIndex`` an array of tuples where each tuple is unique. A +``MultiIndex`` can be created from a list of arrays (using +``MultiIndex.from_arrays``), an array of tuples (using +``MultiIndex.from_tuples``), or a crossed set of iterables (using +``MultiIndex.from_product``). The ``Index`` constructor will attempt to return +a ``MultiIndex`` when it is passed a list of tuples. The following examples +demo different ways to initialize MultiIndexes. + + +.. ipython:: python + + arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], + ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] + tuples = list(zip(*arrays)) + tuples + + index = MultiIndex.from_tuples(tuples, names=['first', 'second']) + index + + s = Series(randn(8), index=index) + s + +When you want every pairing of the elements in two iterables, it can be easier +to use the ``MultiIndex.from_product`` function: + +.. ipython:: python + + iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']] + MultiIndex.from_product(iterables, names=['first', 'second']) + +As a convenience, you can pass a list of arrays directly into Series or +DataFrame to construct a MultiIndex automatically: + +.. ipython:: python + + arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), + np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])] + s = Series(randn(8), index=arrays) + s + df = DataFrame(randn(8, 4), index=arrays) + df + +All of the ``MultiIndex`` constructors accept a ``names`` argument which stores +string names for the levels themselves. If no names are provided, ``None`` will +be assigned: + +.. ipython:: python + + df.index.names + +This index can back any axis of a pandas object, and the number of **levels** +of the index is up to you: + +.. ipython:: python + + df = DataFrame(randn(3, 8), index=['A', 'B', 'C'], columns=index) + df + DataFrame(randn(6, 6), index=index[:6], columns=index[:6]) + +We've "sparsified" the higher levels of the indexes to make the console output a +bit easier on the eyes. + +It's worth keeping in mind that there's nothing preventing you from using +tuples as atomic labels on an axis: + +.. ipython:: python + + Series(randn(8), index=tuples) + +The reason that the ``MultiIndex`` matters is that it can allow you to do +grouping, selection, and reshaping operations as we will describe below and in +subsequent areas of the documentation. As you will see in later sections, you +can find yourself working with hierarchically-indexed data without creating a +``MultiIndex`` explicitly yourself. However, when loading data from a file, you +may wish to generate your own ``MultiIndex`` when preparing the data set. + +Note that how the index is displayed by be controlled using the +``multi_sparse`` option in ``pandas.set_printoptions``: + +.. ipython:: python + + pd.set_option('display.multi_sparse', False) + df + pd.set_option('display.multi_sparse', True) + +.. _advanced.get_level_values: + +Reconstructing the level labels +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The method ``get_level_values`` will return a vector of the labels for each +location at a particular level: + +.. ipython:: python + + index.get_level_values(0) + index.get_level_values('second') + + +Basic indexing on axis with MultiIndex +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One of the important features of hierarchical indexing is that you can select +data by a "partial" label identifying a subgroup in the data. **Partial** +selection "drops" levels of the hierarchical index in the result in a +completely analogous way to selecting a column in a regular DataFrame: + +.. ipython:: python + + df['bar'] + df['bar', 'one'] + df['bar']['one'] + s['qux'] + +See :ref:`Cross-section with hierarchical index ` for how to select +on a deeper level. + + +Data alignment and using ``reindex`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Operations between differently-indexed objects having ``MultiIndex`` on the +axes will work as you expect; data alignment will work the same as an Index of +tuples: + +.. ipython:: python + + s + s[:-2] + s + s[::2] + +``reindex`` can be called with another ``MultiIndex`` or even a list or array +of tuples: + +.. ipython:: python + + s.reindex(index[:3]) + s.reindex([('foo', 'two'), ('bar', 'one'), ('qux', 'one'), ('baz', 'one')]) + +.. _advanced.advanced_hierarchical: + +Advanced indexing with hierarchical index +----------------------------------------- + +Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc/.ix`` is a +bit challenging, but we've made every effort to do so. for example the +following works as you would expect: + +.. ipython:: python + + df = df.T + df + df.loc['bar'] + df.loc['bar', 'two'] + +"Partial" slicing also works quite nicely. + +.. ipython:: python + + df.loc['baz':'foo'] + +You can slice with a 'range' of values, by providing a slice of tuples. + +.. ipython:: python + + df.loc[('baz', 'two'):('qux', 'one')] + df.loc[('baz', 'two'):'foo'] + +Passing a list of labels or tuples works similar to reindexing: + +.. ipython:: python + + df.ix[[('bar', 'two'), ('qux', 'one')]] + +.. _advanced.mi_slicers: + +Using slicers +~~~~~~~~~~~~~ + +.. versionadded:: 0.14.0 + +In 0.14.0 we added a new way to slice multi-indexed objects. +You can slice a multi-index by providing multiple indexers. + +You can provide any of the selectors as if you are indexing by label, see :ref:`Selection by Label `, +including slices, lists of labels, labels, and boolean indexers. + +You can use ``slice(None)`` to select all the contents of *that* level. You do not need to specify all the +*deeper* levels, they will be implied as ``slice(None)``. + +As usual, **both sides** of the slicers are included as this is label indexing. + +.. warning:: + + You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and + for the **columns**. Their are some ambiguous cases where the passed indexer could be mis-interpreted + as indexing *both* axes, rather than into say the MuliIndex for the rows. + + You should do this: + + .. code-block:: python + + df.loc[(slice('A1','A3'),.....),:] + + rather than this: + + .. code-block:: python + + df.loc[(slice('A1','A3'),.....)] + +.. warning:: + + You will need to make sure that the selection axes are fully lexsorted! + +.. ipython:: python + + def mklbl(prefix,n): + return ["%s%s" % (prefix,i) for i in range(n)] + + miindex = MultiIndex.from_product([mklbl('A',4), + mklbl('B',2), + mklbl('C',4), + mklbl('D',2)]) + micolumns = MultiIndex.from_tuples([('a','foo'),('a','bar'), + ('b','foo'),('b','bah')], + names=['lvl0', 'lvl1']) + dfmi = DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))), + index=miindex, + columns=micolumns).sortlevel().sortlevel(axis=1) + dfmi + +Basic multi-index slicing using slices, lists, and labels. + +.. ipython:: python + + dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:] + +You can use a ``pd.IndexSlice`` to have a more natural syntax using ``:`` rather than using ``slice(None)`` + +.. ipython:: python + + idx = pd.IndexSlice + dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']] + +It is possible to perform quite complicated selections using this method on multiple +axes at the same time. + +.. ipython:: python + + dfmi.loc['A1',(slice(None),'foo')] + dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']] + +Using a boolean indexer you can provide selection related to the *values*. + +.. ipython:: python + + mask = dfmi[('a','foo')]>200 + dfmi.loc[idx[mask,:,['C1','C3']],idx[:,'foo']] + +You can also specify the ``axis`` argument to ``.loc`` to interpret the passed +slicers on a single axis. + +.. ipython:: python + + dfmi.loc(axis=0)[:,:,['C1','C3']] + +Furthermore you can *set* the values using these methods + +.. ipython:: python + + df2 = dfmi.copy() + df2.loc(axis=0)[:,:,['C1','C3']] = -10 + df2 + +You can use a right-hand-side of an alignable object as well. + +.. ipython:: python + + df2 = dfmi.copy() + df2.loc[idx[:,:,['C1','C3']],:] = df2*1000 + df2 + +.. _advanced.xs: + +Cross-section +~~~~~~~~~~~~~ + +The ``xs`` method of ``DataFrame`` additionally takes a level argument to make +selecting data at a particular level of a MultiIndex easier. + +.. ipython:: python + + df + df.xs('one', level='second') + +.. ipython:: python + + # using the slicers (new in 0.14.0) + df.loc[(slice(None),'one'),:] + +You can also select on the columns with :meth:`~pandas.MultiIndex.xs`, by +providing the axis argument + +.. ipython:: python + + df = df.T + df.xs('one', level='second', axis=1) + +.. ipython:: python + + # using the slicers (new in 0.14.0) + df.loc[:,(slice(None),'one')] + +:meth:`~pandas.MultiIndex.xs` also allows selection with multiple keys + +.. ipython:: python + + df.xs(('one', 'bar'), level=('second', 'first'), axis=1) + +.. ipython:: python + + # using the slicers (new in 0.14.0) + df.loc[:,('bar','one')] + +.. versionadded:: 0.13.0 + +You can pass ``drop_level=False`` to :meth:`~pandas.MultiIndex.xs` to retain +the level that was selected + +.. ipython:: python + + df.xs('one', level='second', axis=1, drop_level=False) + +versus the result with ``drop_level=True`` (the default value) + +.. ipython:: python + + df.xs('one', level='second', axis=1, drop_level=True) + +.. ipython:: python + :suppress: + + df = df.T + +.. _advanced.advanced_reindex: + +Advanced reindexing and alignment +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The parameter ``level`` has been added to the ``reindex`` and ``align`` methods +of pandas objects. This is useful to broadcast values across a level. For +instance: + +.. ipython:: python + + midx = MultiIndex(levels=[['zero', 'one'], ['x','y']], + labels=[[1,1,0,0],[1,0,1,0]]) + df = DataFrame(randn(4,2), index=midx) + df + df2 = df.mean(level=0) + df2 + df2.reindex(df.index, level=0) + + # aligning + df_aligned, df2_aligned = df.align(df2, level=0) + df_aligned + df2_aligned + + +Swapping levels with :meth:`~pandas.MultiIndex.swaplevel` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``swaplevel`` function can switch the order of two levels: + +.. ipython:: python + + df[:5] + df[:5].swaplevel(0, 1, axis=0) + +.. _advanced.reorderlevels: + +Reordering levels with :meth:`~pandas.MultiIndex.reorder_levels` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``reorder_levels`` function generalizes the ``swaplevel`` function, +allowing you to permute the hierarchical index levels in one step: + +.. ipython:: python + + df[:5].reorder_levels([1,0], axis=0) + +The need for sortedness with :class:`~pandas.MultiIndex` +-------------------------------------------------------- + +**Caveat emptor**: the present implementation of ``MultiIndex`` requires that +the labels be sorted for some of the slicing / indexing routines to work +correctly. You can think about breaking the axis into unique groups, where at +the hierarchical level of interest, each distinct group shares a label, but no +two have the same label. However, the ``MultiIndex`` does not enforce this: +**you are responsible for ensuring that things are properly sorted**. There is +an important new method ``sortlevel`` to sort an axis within a ``MultiIndex`` +so that its labels are grouped and sorted by the original ordering of the +associated factor at that level. Note that this does not necessarily mean the +labels will be sorted lexicographically! + +.. ipython:: python + + import random; random.shuffle(tuples) + s = Series(randn(8), index=MultiIndex.from_tuples(tuples)) + s + s.sortlevel(0) + s.sortlevel(1) + +.. _advanced.sortlevel_byname: + +Note, you may also pass a level name to ``sortlevel`` if the MultiIndex levels +are named. + +.. ipython:: python + + s.index.set_names(['L1', 'L2'], inplace=True) + s.sortlevel(level='L1') + s.sortlevel(level='L2') + +Some indexing will work even if the data are not sorted, but will be rather +inefficient and will also return a copy of the data rather than a view: + +.. ipython:: python + + s['qux'] + s.sortlevel(1)['qux'] + +On higher dimensional objects, you can sort any of the other axes by level if +they have a MultiIndex: + +.. ipython:: python + + df.T.sortlevel(1, axis=1) + +The ``MultiIndex`` object has code to **explicity check the sort depth**. Thus, +if you try to index at a depth at which the index is not sorted, it will raise +an exception. Here is a concrete example to illustrate this: + +.. ipython:: python + + tuples = [('a', 'a'), ('a', 'b'), ('b', 'a'), ('b', 'b')] + idx = MultiIndex.from_tuples(tuples) + idx.lexsort_depth + + reordered = idx[[1, 0, 3, 2]] + reordered.lexsort_depth + + s = Series(randn(4), index=reordered) + s.ix['a':'a'] + +However: + +:: + + >>> s.ix[('a', 'b'):('b', 'a')] + Traceback (most recent call last) + ... + KeyError: Key length (3) was greater than MultiIndex lexsort depth (2) + + +Take Methods +------------ + +.. _advanced.take: + +Similar to numpy ndarrays, pandas Index, Series, and DataFrame also provides +the ``take`` method that retrieves elements along a given axis at the given +indices. The given indices must be either a list or an ndarray of integer +index positions. ``take`` will also accept negative integers as relative positions to the end of the object. + +.. ipython:: python + + index = Index(randint(0, 1000, 10)) + index + + positions = [0, 9, 3] + + index[positions] + index.take(positions) + + ser = Series(randn(10)) + + ser.iloc[positions] + ser.take(positions) + +For DataFrames, the given indices should be a 1d list or ndarray that specifies +row or column positions. + +.. ipython:: python + + frm = DataFrame(randn(5, 3)) + + frm.take([1, 4, 3]) + + frm.take([0, 2], axis=1) + +It is important to note that the ``take`` method on pandas objects are not +intended to work on boolean indices and may return unexpected results. + +.. ipython:: python + + arr = randn(10) + arr.take([False, False, True, True]) + arr[[0, 1]] + + ser = Series(randn(10)) + ser.take([False, False, True, True]) + ser.ix[[0, 1]] + +Finally, as a small note on performance, because the ``take`` method handles +a narrower range of inputs, it can offer performance that is a good deal +faster than fancy indexing. + +.. ipython:: + + arr = randn(10000, 5) + indexer = np.arange(10000) + random.shuffle(indexer) + + timeit arr[indexer] + timeit arr.take(indexer, axis=0) + + ser = Series(arr[:, 0]) + timeit ser.ix[indexer] + timeit ser.take(indexer) + +.. _indexing.float64index: + +Float64Index +------------ + +.. note:: + + As of 0.14.0, ``Float64Index`` is backed by a native ``float64`` dtype + array. Prior to 0.14.0, ``Float64Index`` was backed by an ``object`` dtype + array. Using a ``float64`` dtype in the backend speeds up arithmetic + operations by about 30x and boolean indexing operations on the + ``Float64Index`` itself are about 2x as fast. + + +.. versionadded:: 0.13.0 + +By default a ``Float64Index`` will be automatically created when passing floating, or mixed-integer-floating values in index creation. +This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the +same. + +.. ipython:: python + + indexf = Index([1.5, 2, 3, 4.5, 5]) + indexf + sf = Series(range(5),index=indexf) + sf + +Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``) + +.. ipython:: python + + sf[3] + sf[3.0] + sf.ix[3] + sf.ix[3.0] + sf.loc[3] + sf.loc[3.0] + +The only positional indexing is via ``iloc`` + +.. ipython:: python + + sf.iloc[3] + +A scalar index that is not found will raise ``KeyError`` + +Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc`` + +.. ipython:: python + + sf[2:4] + sf.ix[2:4] + sf.loc[2:4] + sf.iloc[2:4] + +In float indexes, slicing using floats is allowed + +.. ipython:: python + + sf[2.1:4.6] + sf.loc[2.1:4.6] + +In non-float indexes, slicing using floats will raise a ``TypeError`` + +.. code-block:: python + + In [1]: Series(range(5))[3.5] + TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index) + + In [1]: Series(range(5))[3.5:4.5] + TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index) + +Using a scalar float indexer will be deprecated in a future version, but is allowed for now. + +.. code-block:: python + + In [3]: Series(range(5))[3.0] + Out[3]: 3 + +Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat +irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for +example be millisecond offsets. + +.. ipython:: python + + dfir = concat([DataFrame(randn(5,2), + index=np.arange(5) * 250.0, + columns=list('AB')), + DataFrame(randn(6,2), + index=np.arange(4,10) * 250.1, + columns=list('AB'))]) + dfir + +Selection operations then will always work on a value basis, for all selection operators. + +.. ipython:: python + + dfir[0:1000.4] + dfir.loc[0:1001,'A'] + dfir.loc[1000.4] + +You could then easily pick out the first 1 second (1000 ms) of data then. + +.. ipython:: python + + dfir[0:1000] + +Of course if you need integer based selection, then use ``iloc`` + +.. ipython:: python + + dfir.iloc[0:5] + diff --git a/doc/source/basics.rst b/doc/source/basics.rst index 81c2dfd4311f9..884976b55d6d1 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -410,7 +410,7 @@ values: Here is a quick reference summary table of common functions. Each also takes an optional ``level`` parameter which applies only if the object has a -:ref:`hierarchical index`. +:ref:`hierarchical index`. .. csv-table:: :header: "Function", "Description" @@ -822,7 +822,7 @@ DataFrame's index. .. seealso:: - :ref:`Advanced indexing ` is an even more concise way of + :ref:`MultiIndex / Advanced Indexing ` is an even more concise way of doing reindexing. .. note:: diff --git a/doc/source/cookbook.rst b/doc/source/cookbook.rst index 805316d199fc6..243d1c02d1a65 100644 --- a/doc/source/cookbook.rst +++ b/doc/source/cookbook.rst @@ -86,7 +86,7 @@ The :ref:`indexing ` docs. MultiIndexing ------------- -The :ref:`multindexing ` docs. +The :ref:`multindexing ` docs. `Creating a multi-index from a labeled frame `__ diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst index 928de285982cf..44321375d31a2 100644 --- a/doc/source/dsintro.rst +++ b/doc/source/dsintro.rst @@ -828,7 +828,7 @@ Conversion to DataFrame ~~~~~~~~~~~~~~~~~~~~~~~ A Panel can be represented in 2D form as a hierarchically indexed -DataFrame. See the section :ref:`hierarchical indexing ` +DataFrame. See the section :ref:`hierarchical indexing ` for more on this. To convert a Panel to a DataFrame, use the ``to_frame`` method: diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst index fb1004edca785..1b21c5d7291e5 100644 --- a/doc/source/groupby.rst +++ b/doc/source/groupby.rst @@ -233,7 +233,7 @@ however pass ``sort=False`` for potential speedups: GroupBy with MultiIndex ~~~~~~~~~~~~~~~~~~~~~~~ -With :ref:`hierarchically-indexed data `, it's quite +With :ref:`hierarchically-indexed data `, it's quite natural to group by one of the levels of the hierarchy. .. ipython:: python @@ -358,7 +358,7 @@ An obvious one is aggregation via the ``aggregate`` or equivalently ``agg`` meth As you can see, the result of the aggregation will have the group names as the new index along the grouped axis. In the case of multiple keys, the result is a -:ref:`MultiIndex ` by default, though this can be +:ref:`MultiIndex ` by default, though this can be changed by using the ``as_index`` option: .. ipython:: python diff --git a/doc/source/index.rst.template b/doc/source/index.rst.template index 4e1d2b471d1c0..a845e31d95e90 100644 --- a/doc/source/index.rst.template +++ b/doc/source/index.rst.template @@ -124,6 +124,7 @@ See the package overview for more detail about what's in the library. basics options indexing + advanced computation missing_data groupby @@ -148,5 +149,6 @@ See the package overview for more detail about what's in the library. {% endif -%} {%if not single -%} contributing + internals release {% endif -%} diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index 4bde90a402456..c458dac22acca 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -58,10 +58,12 @@ indexing. but instead subclass ``PandasObject``, similarly to the rest of the pandas objects. This should be a transparent change with only very limited API implications (See the :ref:`Internal Refactoring `) +See the :ref:`MultiIndex / Advanced Indexing ` for ``MultiIndex`` and more advanced indexing documentation. + See the :ref:`cookbook` for some advanced strategies -Different Choices for Indexing (``loc``, ``iloc``, and ``ix``) --------------------------------------------------------------- +Different Choices for Indexing +------------------------------ .. versionadded:: 0.11.0 @@ -102,9 +104,9 @@ of multi-axis indexing. whether the slice is interpreted as position based or label based, it's usually better to be explicit and use ``.iloc`` or ``.loc``. - See more at :ref:`Advanced Indexing `, :ref:`Advanced - Hierarchical ` and :ref:`Fallback Indexing - ` + See more at :ref:`Advanced Indexing `, :ref:`Advanced + Hierarchical ` and :ref:`Fallback Indexing + ` Getting values from an object with multi-axes selection uses the following notation (using ``.loc`` as an example, but applies to ``.iloc`` and ``.ix`` as @@ -579,7 +581,7 @@ more complex criteria: df2[criterion & (df2['b'] == 'x')] Note, with the choice methods :ref:`Selection by Label `, :ref:`Selection by Position `, -and :ref:`Advanced Indexing ` you may select along more than one axis using boolean vectors combined with other indexing expressions. +and :ref:`Advanced Indexing ` you may select along more than one axis using boolean vectors combined with other indexing expressions. .. ipython:: python @@ -1078,71 +1080,6 @@ floating point values generated using ``numpy.random.randn()``. df = DataFrame(randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) df2 = df.copy() -Take Methods ------------- - -.. _indexing.take: - -Similar to numpy ndarrays, pandas Index, Series, and DataFrame also provides -the ``take`` method that retrieves elements along a given axis at the given -indices. The given indices must be either a list or an ndarray of integer -index positions. ``take`` will also accept negative integers as relative positions to the end of the object. - -.. ipython:: python - - index = Index(randint(0, 1000, 10)) - index - - positions = [0, 9, 3] - - index[positions] - index.take(positions) - - ser = Series(randn(10)) - - ser.ix[positions] - ser.take(positions) - -For DataFrames, the given indices should be a 1d list or ndarray that specifies -row or column positions. - -.. ipython:: python - - frm = DataFrame(randn(5, 3)) - - frm.take([1, 4, 3]) - - frm.take([0, 2], axis=1) - -It is important to note that the ``take`` method on pandas objects are not -intended to work on boolean indices and may return unexpected results. - -.. ipython:: python - - arr = randn(10) - arr.take([False, False, True, True]) - arr[[0, 1]] - - ser = Series(randn(10)) - ser.take([False, False, True, True]) - ser.ix[[0, 1]] - -Finally, as a small note on performance, because the ``take`` method handles -a narrower range of inputs, it can offer performance that is a good deal -faster than fancy indexing. - -.. ipython:: - - arr = randn(10000, 5) - indexer = np.arange(10000) - random.shuffle(indexer) - - timeit arr[indexer] - timeit arr.take(indexer, axis=0) - - ser = Series(arr[:, 0]) - timeit ser.ix[indexer] - timeit ser.take(indexer) Duplicate Data -------------- @@ -1183,229 +1120,228 @@ default value. s.get('a') # equivalent to s['a'] s.get('x', default=-1) -.. _indexing.advanced: +The :meth:`~pandas.DataFrame.select` Method +------------------------------------------- -Advanced Indexing with ``.ix`` ------------------------------- +Another way to extract slices from an object is with the ``select`` method of +Series, DataFrame, and Panel. This method should be used only when there is no +more direct way. ``select`` takes a function which operates on labels along +``axis`` and returns a boolean. For instance: -.. note:: +.. ipython:: python + + df.select(lambda x: x == 'A', axis=1) - The recent addition of ``.loc`` and ``.iloc`` have enabled users to be quite - explicit about indexing choices. ``.ix`` allows a great flexibility to - specify indexing locations by *label* and/or *integer position*. pandas will - attempt to use any passed *integer* as *label* locations first (like what - ``.loc`` would do, then to fall back on *positional* indexing, like what - ``.iloc`` would do). See :ref:`Fallback Indexing ` for - an example. +The :meth:`~pandas.DataFrame.lookup` Method +------------------------------------------- -The syntax of using ``.ix`` is identical to ``.loc``, in :ref:`Selection by -Label `, and ``.iloc`` in :ref:`Selection by Position `. +Sometimes you want to extract a set of values given a sequence of row labels +and column labels, and the ``lookup`` method allows for this and returns a +numpy array. For instance, -The ``.ix`` attribute takes the following inputs: +.. ipython:: python -- An integer or single label, e.g. ``5`` or ``'a'`` -- A list or array of labels ``['a', 'b', 'c']`` or integers ``[4, 3, 0]`` -- A slice object with ints ``1:7`` or labels ``'a':'f'`` -- A boolean array + dflookup = DataFrame(np.random.rand(20,4), columns = ['A','B','C','D']) + dflookup.lookup(list(range(0,10,2)), ['B','C','A','B','D']) -We'll illustrate all of these methods. First, note that this provides a concise -way of reindexing on multiple axes at once: +.. _indexing.class: -.. ipython:: python +Index objects +------------- - subindex = dates[[3,4,5]] - df.reindex(index=subindex, columns=['C', 'B']) - df.ix[subindex, ['C', 'B']] +The pandas :class:`~pandas.Index` class and its subclasses can be viewed as +implementing an *ordered multiset*. Duplicates are allowed. However, if you try +to convert an :class:`~pandas.Index` object with duplicate entries into a +``set``, an exception will be raised. -Assignment / setting values is possible when using ``ix``: +:class:`~pandas.Index` also provides the infrastructure necessary for +lookups, data alignment, and reindexing. The easiest way to create an +:class:`~pandas.Index` directly is to pass a ``list`` or other sequence to +:class:`~pandas.Index`: .. ipython:: python - df2 = df.copy() - df2.ix[subindex, ['C', 'B']] = 0 - df2 + index = Index(['e', 'd', 'a', 'b']) + index + 'd' in index + +You can also pass a ``name`` to be stored in the index: -Indexing with an array of integers can also be done: .. ipython:: python - df.ix[[4,3,1]] - df.ix[dates[[4,3,1]]] + index = Index(['e', 'd', 'a', 'b'], name='something') + index.name -**Slicing** has standard Python semantics for integer slices: +The name, if set, will be shown in the console display: .. ipython:: python - df.ix[1:7, :2] + index = Index(list(range(5)), name='rows') + columns = Index(['A', 'B', 'C'], name='cols') + df = DataFrame(np.random.randn(5, 3), index=index, columns=columns) + df + df['A'] -Slicing with labels is semantically slightly different because the slice start -and stop are **inclusive** in the label-based case: -.. ipython:: python +Set operations on Index objects +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - date1, date2 = dates[[2, 4]] - print(date1, date2) - df.ix[date1:date2] - df['A'].ix[date1:date2] +.. _indexing.set_ops: -Getting and setting rows in a DataFrame, especially by their location, is much -easier: +The three main operations are ``union (|)``, ``intersection (&)``, and ``diff +(-)``. These can be directly called as instance methods or used via overloaded +operators: .. ipython:: python - df2 = df[:5].copy() - df2.ix[3] - df2.ix[3] = np.arange(len(df2.columns)) - df2 + a = Index(['c', 'b', 'a']) + b = Index(['c', 'e', 'd']) + a.union(b) + a | b + a & b + a - b -Column or row selection can be combined as you would expect with arrays of -labels or even boolean vectors: +Also available is the ``sym_diff (^)`` operation, which returns elements +that appear in either ``idx1`` or ``idx2`` but not both. This is +equivalent to the Index created by ``(idx1 - idx2) + (idx2 - idx1)``, +with duplicates dropped. .. ipython:: python - df.ix[df['A'] > 0, 'B'] - df.ix[date1:date2, 'B'] - df.ix[date1, 'B'] - -Slicing with labels is closely related to the ``truncate`` method which does -precisely ``.ix[start:stop]`` but returns a copy (for legacy reasons). + idx1 = Index([1, 2, 3, 4]) + idx2 = Index([2, 3, 4, 5]) + idx1.sym_diff(idx2) + idx1 ^ idx2 -The :meth:`~pandas.DataFrame.select` Method -------------------------------------------- +Setting index metadata (``name(s)``, ``levels``, ``labels``) +------------------------------------------------------------ -Another way to extract slices from an object is with the ``select`` method of -Series, DataFrame, and Panel. This method should be used only when there is no -more direct way. ``select`` takes a function which operates on labels along -``axis`` and returns a boolean. For instance: +.. versionadded:: 0.13.0 -.. ipython:: python +.. _indexing.set_metadata: - df.select(lambda x: x == 'A', axis=1) +Indexes are "mostly immutable", but it is possible to set and change their +metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and +``labels``). -The :meth:`~pandas.DataFrame.lookup` Method -------------------------------------------- +You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels`` +to set these attributes directly. They default to returning a copy; however, +you can specify ``inplace=True`` to have the data change in place. -Sometimes you want to extract a set of values given a sequence of row labels -and column labels, and the ``lookup`` method allows for this and returns a -numpy array. For instance, +See :ref:`Advanced Indexing ` for usage of MultiIndexes. .. ipython:: python - dflookup = DataFrame(np.random.rand(20,4), columns = ['A','B','C','D']) - dflookup.lookup(list(range(0,10,2)), ['B','C','A','B','D']) + ind = Index([1, 2, 3]) + ind.rename("apple") + ind + ind.set_names(["apple"], inplace=True) + ind.name = "bob" + ind -.. _indexing.float64index: +.. versionadded:: 0.15.0 -Float64Index ------------- +``set_names``, ``set_levels``, and ``set_labels`` also take an optional +`level`` argument -.. note:: +.. ipython:: python - As of 0.14.0, ``Float64Index`` is backed by a native ``float64`` dtype - array. Prior to 0.14.0, ``Float64Index`` was backed by an ``object`` dtype - array. Using a ``float64`` dtype in the backend speeds up arithmetic - operations by about 30x and boolean indexing operations on the - ``Float64Index`` itself are about 2x as fast. + index = MultiIndex.from_product([range(3), ['one', 'two']], names=['first', 'second']) + index + index.levels[1] + index.set_levels(["a", "b"], level=1) -.. versionadded:: 0.13.0 +Set / Reset Index +----------------- -By default a ``Float64Index`` will be automatically created when passing floating, or mixed-integer-floating values in index creation. -This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the -same. +Occasionally you will load or create a data set into a DataFrame and want to +add an index after you've already done so. There are a couple of different +ways. -.. ipython:: python +Set an index +~~~~~~~~~~~~ - indexf = Index([1.5, 2, 3, 4.5, 5]) - indexf - sf = Series(range(5),index=indexf) - sf +.. _indexing.set_index: -Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``) +DataFrame has a ``set_index`` method which takes a column name (for a regular +``Index``) or a list of column names (for a ``MultiIndex``), to create a new, +indexed DataFrame: .. ipython:: python + :suppress: - sf[3] - sf[3.0] - sf.ix[3] - sf.ix[3.0] - sf.loc[3] - sf.loc[3.0] - -The only positional indexing is via ``iloc`` + data = DataFrame({'a' : ['bar', 'bar', 'foo', 'foo'], + 'b' : ['one', 'two', 'one', 'two'], + 'c' : ['z', 'y', 'x', 'w'], + 'd' : [1., 2., 3, 4]}) .. ipython:: python - sf.iloc[3] - -A scalar index that is not found will raise ``KeyError`` + data + indexed1 = data.set_index('c') + indexed1 + indexed2 = data.set_index(['a', 'b']) + indexed2 -Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc`` +The ``append`` keyword option allow you to keep the existing index and append +the given columns to a MultiIndex: .. ipython:: python - sf[2:4] - sf.ix[2:4] - sf.loc[2:4] - sf.iloc[2:4] + frame = data.set_index('c', drop=False) + frame = frame.set_index(['a', 'b'], append=True) + frame -In float indexes, slicing using floats is allowed +Other options in ``set_index`` allow you not drop the index columns or to add +the index in-place (without creating a new object): .. ipython:: python - sf[2.1:4.6] - sf.loc[2.1:4.6] - -In non-float indexes, slicing using floats will raise a ``TypeError`` - -.. code-block:: python + data.set_index('c', drop=False) + data.set_index(['a', 'b'], inplace=True) + data - In [1]: Series(range(5))[3.5] - TypeError: the label [3.5] is not a proper indexer for this index type (Int64Index) +Reset the index +~~~~~~~~~~~~~~~ - In [1]: Series(range(5))[3.5:4.5] - TypeError: the slice start [3.5] is not a proper indexer for this index type (Int64Index) +As a convenience, there is a new function on DataFrame called ``reset_index`` +which transfers the index values into the DataFrame's columns and sets a simple +integer index. This is the inverse operation to ``set_index`` -Using a scalar float indexer will be deprecated in a future version, but is allowed for now. +.. ipython:: python -.. code-block:: python + data + data.reset_index() - In [3]: Series(range(5))[3.0] - Out[3]: 3 +The output is more similar to a SQL table or a record array. The names for the +columns derived from the index are the ones stored in the ``names`` attribute. -Here is a typical use-case for using this type of indexing. Imagine that you have a somewhat -irregular timedelta-like indexing scheme, but the data is recorded as floats. This could for -example be millisecond offsets. +You can use the ``level`` keyword to remove only a portion of the index: .. ipython:: python - dfir = concat([DataFrame(randn(5,2), - index=np.arange(5) * 250.0, - columns=list('AB')), - DataFrame(randn(6,2), - index=np.arange(4,10) * 250.1, - columns=list('AB'))]) - dfir - -Selection operations then will always work on a value basis, for all selection operators. + frame + frame.reset_index(level=1) -.. ipython:: python - dfir[0:1000.4] - dfir.loc[0:1001,'A'] - dfir.loc[1000.4] +``reset_index`` takes an optional parameter ``drop`` which if true simply +discards the index, instead of putting index values in the DataFrame's columns. -You could then easily pick out the first 1 second (1000 ms) of data then. +.. note:: -.. ipython:: python + The ``reset_index`` method used to be called ``delevel`` which is now + deprecated. - dfir[0:1000] +Adding an ad hoc index +~~~~~~~~~~~~~~~~~~~~~~ -Of course if you need integer based selection, then use ``iloc`` +If you create an index yourself, you can just assign it to the ``index`` field: -.. ipython:: python +.. code-block:: python - dfir.iloc[0:5] + data.index = index .. _indexing.view_versus_copy: @@ -1539,800 +1475,3 @@ This will **not** work at all, and so should be avoided reported. -Fallback indexing ------------------ - -.. _indexing.fallback: - -Float indexes should be used only with caution. If you have a float indexed -``DataFrame`` and try to select using an integer, the row that pandas returns -might not be what you expect. pandas first attempts to use the *integer* -as a *label* location, but fails to find a match (because the types -are not equal). pandas then falls back to back to positional indexing. - -.. ipython:: python - - df = pd.DataFrame(np.random.randn(4,4), - columns=list('ABCD'), index=[1.0, 2.0, 3.0, 4.0]) - df - df.ix[1] - -To select the row you do expect, instead use a float label or -use ``iloc``. - -.. ipython:: python - - df.ix[1.0] - df.iloc[0] - -Instead of using a float index, it is often better to -convert to an integer index: - -.. ipython:: python - - df_new = df.reset_index() - df_new[df_new['index'] == 1.0] - # now you can also do "float selection" - df_new[(df_new['index'] >= 1.0) & (df_new['index'] < 2)] - - -.. _indexing.class: - -Index objects -------------- - -The pandas :class:`~pandas.Index` class and its subclasses can be viewed as -implementing an *ordered multiset*. Duplicates are allowed. However, if you try -to convert an :class:`~pandas.Index` object with duplicate entries into a -``set``, an exception will be raised. - -:class:`~pandas.Index` also provides the infrastructure necessary for -lookups, data alignment, and reindexing. The easiest way to create an -:class:`~pandas.Index` directly is to pass a ``list`` or other sequence to -:class:`~pandas.Index`: - -.. ipython:: python - - index = Index(['e', 'd', 'a', 'b']) - index - 'd' in index - -You can also pass a ``name`` to be stored in the index: - - -.. ipython:: python - - index = Index(['e', 'd', 'a', 'b'], name='something') - index.name - -Starting with pandas 0.5, the name, if set, will be shown in the console -display: - -.. ipython:: python - - index = Index(list(range(5)), name='rows') - columns = Index(['A', 'B', 'C'], name='cols') - df = DataFrame(np.random.randn(5, 3), index=index, columns=columns) - df - df['A'] - -.. _indexing.setops: - -Set operations on Index objects -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. warning:: - - In 0.15.0. the set operations ``+`` and ``-`` were deprecated in order to provide these for numeric type operations on certain - index types. ``+`` can be replace by ``.union()`` or ``|``, and ``-`` by ``.difference()``. - -.. _indexing.set_ops: - -The two main operations are ``union (|)``, ``intersection (&)`` -These can be directly called as instance methods or used via overloaded -operators. Difference is provided via the ``.difference()`` method. - -.. ipython:: python - - a = Index(['c', 'b', 'a']) - b = Index(['c', 'e', 'd']) - a | b - a & b - a.difference(b) - -Also available is the ``sym_diff (^)`` operation, which returns elements -that appear in either ``idx1`` or ``idx2`` but not both. This is -equivalent to the Index created by ``(idx1.difference(idx2)).union(idx2.difference(idx1))``, -with duplicates dropped. - -.. ipython:: python - - idx1 = Index([1, 2, 3, 4]) - idx2 = Index([2, 3, 4, 5]) - idx1.sym_diff(idx2) - idx1 ^ idx2 - -.. _indexing.hierarchical: - -Hierarchical indexing (MultiIndex) ----------------------------------- - -Hierarchical indexing (also referred to as "multi-level" indexing) is brand new -in the pandas 0.4 release. It is very exciting as it opens the door to some -quite sophisticated data analysis and manipulation, especially for working with -higher dimensional data. In essence, it enables you to store and manipulate -data with an arbitrary number of dimensions in lower dimensional data -structures like Series (1d) and DataFrame (2d). - -In this section, we will show what exactly we mean by "hierarchical" indexing -and how it integrates with the all of the pandas indexing functionality -described above and in prior sections. Later, when discussing :ref:`group by -` and :ref:`pivoting and reshaping data `, we'll show -non-trivial applications to illustrate how it aids in structuring data for -analysis. - -See the :ref:`cookbook` for some advanced strategies - -Creating a MultiIndex (hierarchical index) object -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The ``MultiIndex`` object is the hierarchical analogue of the standard -``Index`` object which typically stores the axis labels in pandas objects. You -can think of ``MultiIndex`` an array of tuples where each tuple is unique. A -``MultiIndex`` can be created from a list of arrays (using -``MultiIndex.from_arrays``), an array of tuples (using -``MultiIndex.from_tuples``), or a crossed set of iterables (using -``MultiIndex.from_product``). The ``Index`` constructor will attempt to return -a ``MultiIndex`` when it is passed a list of tuples. The following examples -demo different ways to initialize MultiIndexes. - - -.. ipython:: python - - arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], - ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] - tuples = list(zip(*arrays)) - tuples - - index = MultiIndex.from_tuples(tuples, names=['first', 'second']) - index - - s = Series(randn(8), index=index) - s - -When you want every pairing of the elements in two iterables, it can be easier -to use the ``MultiIndex.from_product`` function: - -.. ipython:: python - - iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']] - MultiIndex.from_product(iterables, names=['first', 'second']) - -As a convenience, you can pass a list of arrays directly into Series or -DataFrame to construct a MultiIndex automatically: - -.. ipython:: python - - arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']) - , - np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']) - ] - s = Series(randn(8), index=arrays) - s - df = DataFrame(randn(8, 4), index=arrays) - df - -All of the ``MultiIndex`` constructors accept a ``names`` argument which stores -string names for the levels themselves. If no names are provided, ``None`` will -be assigned: - -.. ipython:: python - - df.index.names - -This index can back any axis of a pandas object, and the number of **levels** -of the index is up to you: - -.. ipython:: python - - df = DataFrame(randn(3, 8), index=['A', 'B', 'C'], columns=index) - df - DataFrame(randn(6, 6), index=index[:6], columns=index[:6]) - -We've "sparsified" the higher levels of the indexes to make the console output a -bit easier on the eyes. - -It's worth keeping in mind that there's nothing preventing you from using -tuples as atomic labels on an axis: - -.. ipython:: python - - Series(randn(8), index=tuples) - -The reason that the ``MultiIndex`` matters is that it can allow you to do -grouping, selection, and reshaping operations as we will describe below and in -subsequent areas of the documentation. As you will see in later sections, you -can find yourself working with hierarchically-indexed data without creating a -``MultiIndex`` explicitly yourself. However, when loading data from a file, you -may wish to generate your own ``MultiIndex`` when preparing the data set. - -Note that how the index is displayed by be controlled using the -``multi_sparse`` option in ``pandas.set_printoptions``: - -.. ipython:: python - - pd.set_option('display.multi_sparse', False) - df - pd.set_option('display.multi_sparse', True) - -Reconstructing the level labels -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. _indexing.get_level_values: - -The method ``get_level_values`` will return a vector of the labels for each -location at a particular level: - -.. ipython:: python - - index.get_level_values(0) - index.get_level_values('second') - - -Basic indexing on axis with MultiIndex -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -One of the important features of hierarchical indexing is that you can select -data by a "partial" label identifying a subgroup in the data. **Partial** -selection "drops" levels of the hierarchical index in the result in a -completely analogous way to selecting a column in a regular DataFrame: - -.. ipython:: python - - df['bar'] - df['bar', 'one'] - df['bar']['one'] - s['qux'] - -See :ref:`Cross-section with hierarchical index ` for how to select -on a deeper level. - - -Data alignment and using ``reindex`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Operations between differently-indexed objects having ``MultiIndex`` on the -axes will work as you expect; data alignment will work the same as an Index of -tuples: - -.. ipython:: python - - s + s[:-2] - s + s[::2] - -``reindex`` can be called with another ``MultiIndex`` or even a list or array -of tuples: - -.. ipython:: python - - s.reindex(index[:3]) - s.reindex([('foo', 'two'), ('bar', 'one'), ('qux', 'one'), ('baz', 'one')]) - -.. _indexing.advanced_hierarchical: - -Advanced indexing with hierarchical index -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Syntactically integrating ``MultiIndex`` in advanced indexing with ``.loc/.ix`` is a -bit challenging, but we've made every effort to do so. for example the -following works as you would expect: - -.. ipython:: python - - df = df.T - df - df.loc['bar'] - df.loc['bar', 'two'] - -"Partial" slicing also works quite nicely. - -.. ipython:: python - - df.loc['baz':'foo'] - -You can slice with a 'range' of values, by providing a slice of tuples. - -.. ipython:: python - - df.loc[('baz', 'two'):('qux', 'one')] - df.loc[('baz', 'two'):'foo'] - -Passing a list of labels or tuples works similar to reindexing: - -.. ipython:: python - - df.ix[[('bar', 'two'), ('qux', 'one')]] - -.. _indexing.mi_slicers: - -Multiindexing using slicers -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. versionadded:: 0.14.0 - -In 0.14.0 we added a new way to slice multi-indexed objects. -You can slice a multi-index by providing multiple indexers. - -You can provide any of the selectors as if you are indexing by label, see :ref:`Selection by Label `, -including slices, lists of labels, labels, and boolean indexers. - -You can use ``slice(None)`` to select all the contents of *that* level. You do not need to specify all the -*deeper* levels, they will be implied as ``slice(None)``. - -As usual, **both sides** of the slicers are included as this is label indexing. - -.. warning:: - - You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and - for the **columns**. Their are some ambiguous cases where the passed indexer could be mis-interpreted - as indexing *both* axes, rather than into say the MuliIndex for the rows. - - You should do this: - - .. code-block:: python - - df.loc[(slice('A1','A3'),.....),:] - - rather than this: - - .. code-block:: python - - df.loc[(slice('A1','A3'),.....)] - -.. warning:: - - You will need to make sure that the selection axes are fully lexsorted! - -.. ipython:: python - - def mklbl(prefix,n): - return ["%s%s" % (prefix,i) for i in range(n)] - - miindex = MultiIndex.from_product([mklbl('A',4), - mklbl('B',2), - mklbl('C',4), - mklbl('D',2)]) - micolumns = MultiIndex.from_tuples([('a','foo'),('a','bar'), - ('b','foo'),('b','bah')], - names=['lvl0', 'lvl1']) - dfmi = DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))), - index=miindex, - columns=micolumns).sortlevel().sortlevel(axis=1) - dfmi - -Basic multi-index slicing using slices, lists, and labels. - -.. ipython:: python - - dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:] - -You can use a ``pd.IndexSlice`` to shortcut the creation of these slices - -.. ipython:: python - - idx = pd.IndexSlice - dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']] - -It is possible to perform quite complicated selections using this method on multiple -axes at the same time. - -.. ipython:: python - - dfmi.loc['A1',(slice(None),'foo')] - dfmi.loc[idx[:,:,['C1','C3']],idx[:,'foo']] - -Using a boolean indexer you can provide selection related to the *values*. - -.. ipython:: python - - mask = dfmi[('a','foo')]>200 - dfmi.loc[idx[mask,:,['C1','C3']],idx[:,'foo']] - -You can also specify the ``axis`` argument to ``.loc`` to interpret the passed -slicers on a single axis. - -.. ipython:: python - - dfmi.loc(axis=0)[:,:,['C1','C3']] - -Furthermore you can *set* the values using these methods - -.. ipython:: python - - df2 = dfmi.copy() - df2.loc(axis=0)[:,:,['C1','C3']] = -10 - df2 - -You can use a right-hand-side of an alignable object as well. - -.. ipython:: python - - df2 = dfmi.copy() - df2.loc[idx[:,:,['C1','C3']],:] = df2*1000 - df2 - -.. _indexing.xs: - -Cross-section with hierarchical index -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The ``xs`` method of ``DataFrame`` additionally takes a level argument to make -selecting data at a particular level of a MultiIndex easier. - -.. ipython:: python - - df.xs('one', level='second') - -.. ipython:: python - - # using the slicers (new in 0.14.0) - df.loc[(slice(None),'one'),:] - -You can also select on the columns with :meth:`~pandas.MultiIndex.xs`, by -providing the axis argument - -.. ipython:: python - - df = df.T - df.xs('one', level='second', axis=1) - -.. ipython:: python - - # using the slicers (new in 0.14.0) - df.loc[:,(slice(None),'one')] - -:meth:`~pandas.MultiIndex.xs` also allows selection with multiple keys - -.. ipython:: python - - df.xs(('one', 'bar'), level=('second', 'first'), axis=1) - -.. ipython:: python - - # using the slicers (new in 0.14.0) - df.loc[:,('bar','one')] - -.. versionadded:: 0.13.0 - -You can pass ``drop_level=False`` to :meth:`~pandas.MultiIndex.xs` to retain -the level that was selected - -.. ipython:: python - - df.xs('one', level='second', axis=1, drop_level=False) - -versus the result with ``drop_level=True`` (the default value) - -.. ipython:: python - - df.xs('one', level='second', axis=1, drop_level=True) - -.. ipython:: python - :suppress: - - df = df.T - -.. _indexing.advanced_reindex: - -Advanced reindexing and alignment with hierarchical index -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The parameter ``level`` has been added to the ``reindex`` and ``align`` methods -of pandas objects. This is useful to broadcast values across a level. For -instance: - -.. ipython:: python - - midx = MultiIndex(levels=[['zero', 'one'], ['x','y']], - labels=[[1,1,0,0],[1,0,1,0]]) - df = DataFrame(randn(4,2), index=midx) - print(df) - df2 = df.mean(level=0) - print(df2) - print(df2.reindex(df.index, level=0)) - df_aligned, df2_aligned = df.align(df2, level=0) - print(df_aligned) - print(df2_aligned) - - -The need for sortedness with :class:`~pandas.MultiIndex` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -**Caveat emptor**: the present implementation of ``MultiIndex`` requires that -the labels be sorted for some of the slicing / indexing routines to work -correctly. You can think about breaking the axis into unique groups, where at -the hierarchical level of interest, each distinct group shares a label, but no -two have the same label. However, the ``MultiIndex`` does not enforce this: -**you are responsible for ensuring that things are properly sorted**. There is -an important new method ``sortlevel`` to sort an axis within a ``MultiIndex`` -so that its labels are grouped and sorted by the original ordering of the -associated factor at that level. Note that this does not necessarily mean the -labels will be sorted lexicographically! - -.. ipython:: python - - import random; random.shuffle(tuples) - s = Series(randn(8), index=MultiIndex.from_tuples(tuples)) - s - s.sortlevel(0) - s.sortlevel(1) - -.. _indexing.sortlevel_byname: - -Note, you may also pass a level name to ``sortlevel`` if the MultiIndex levels -are named. - -.. ipython:: python - - s.index.set_names(['L1', 'L2'], inplace=True) - s.sortlevel(level='L1') - s.sortlevel(level='L2') - -Some indexing will work even if the data are not sorted, but will be rather -inefficient and will also return a copy of the data rather than a view: - -.. ipython:: python - - s['qux'] - s.sortlevel(1)['qux'] - -On higher dimensional objects, you can sort any of the other axes by level if -they have a MultiIndex: - -.. ipython:: python - - df.T.sortlevel(1, axis=1) - -The ``MultiIndex`` object has code to **explicity check the sort depth**. Thus, -if you try to index at a depth at which the index is not sorted, it will raise -an exception. Here is a concrete example to illustrate this: - -.. ipython:: python - - tuples = [('a', 'a'), ('a', 'b'), ('b', 'a'), ('b', 'b')] - idx = MultiIndex.from_tuples(tuples) - idx.lexsort_depth - - reordered = idx[[1, 0, 3, 2]] - reordered.lexsort_depth - - s = Series(randn(4), index=reordered) - s.ix['a':'a'] - -However: - -:: - - >>> s.ix[('a', 'b'):('b', 'a')] - Traceback (most recent call last) - ... - KeyError: Key length (3) was greater than MultiIndex lexsort depth (2) - -Swapping levels with :meth:`~pandas.MultiIndex.swaplevel` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The ``swaplevel`` function can switch the order of two levels: - -.. ipython:: python - - df[:5] - df[:5].swaplevel(0, 1, axis=0) - -.. _indexing.reorderlevels: - -Reordering levels with :meth:`~pandas.MultiIndex.reorder_levels` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The ``reorder_levels`` function generalizes the ``swaplevel`` function, -allowing you to permute the hierarchical index levels in one step: - -.. ipython:: python - - df[:5].reorder_levels([1,0], axis=0) - - -Some gory internal details -~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Internally, the ``MultiIndex`` consists of a few things: the **levels**, the -integer **labels**, and the level **names**: - -.. ipython:: python - - index - index.levels - index.labels - index.names - -You can probably guess that the labels determine which unique element is -identified with that location at each layer of the index. It's important to -note that sortedness is determined **solely** from the integer labels and does -not check (or care) whether the levels themselves are sorted. Fortunately, the -constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but -if you compute the levels and labels yourself, please be careful. - - -Setting index metadata (``name(s)``, ``levels``, ``labels``) ------------------------------------------------------------- - -.. versionadded:: 0.13.0 - -.. _indexing.set_metadata: - -Indexes are "mostly immutable", but it is possible to set and change their -metadata, like the index ``name`` (or, for ``MultiIndex``, ``levels`` and -``labels``). - -You can use the ``rename``, ``set_names``, ``set_levels``, and ``set_labels`` -to set these attributes directly. They default to returning a copy; however, -you can specify ``inplace=True`` to have the data change in place. - -.. ipython:: python - - ind = Index([1, 2, 3]) - ind.rename("apple") - ind - ind.set_names(["apple"], inplace=True) - ind.name = "bob" - ind - -.. versionadded:: 0.15.0 - -``set_names``, ``set_levels``, and ``set_labels`` also take an optional -`level`` argument - -.. ipython:: python - - index - index.levels[1] - index.set_levels(["a", "b"], level=1) - -Adding an index to an existing DataFrame ----------------------------------------- - -Occasionally you will load or create a data set into a DataFrame and want to -add an index after you've already done so. There are a couple of different -ways. - -Add an index using DataFrame columns -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. _indexing.set_index: - -DataFrame has a ``set_index`` method which takes a column name (for a regular -``Index``) or a list of column names (for a ``MultiIndex``), to create a new, -indexed DataFrame: - -.. ipython:: python - :suppress: - - data = DataFrame({'a' : ['bar', 'bar', 'foo', 'foo'], - 'b' : ['one', 'two', 'one', 'two'], - 'c' : ['z', 'y', 'x', 'w'], - 'd' : [1., 2., 3, 4]}) - -.. ipython:: python - - data - indexed1 = data.set_index('c') - indexed1 - indexed2 = data.set_index(['a', 'b']) - indexed2 - -The ``append`` keyword option allow you to keep the existing index and append -the given columns to a MultiIndex: - -.. ipython:: python - - frame = data.set_index('c', drop=False) - frame = frame.set_index(['a', 'b'], append=True) - frame - -Other options in ``set_index`` allow you not drop the index columns or to add -the index in-place (without creating a new object): - -.. ipython:: python - - data.set_index('c', drop=False) - data.set_index(['a', 'b'], inplace=True) - data - -Remove / reset the index, ``reset_index`` -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -As a convenience, there is a new function on DataFrame called ``reset_index`` -which transfers the index values into the DataFrame's columns and sets a simple -integer index. This is the inverse operation to ``set_index`` - -.. ipython:: python - - data - data.reset_index() - -The output is more similar to a SQL table or a record array. The names for the -columns derived from the index are the ones stored in the ``names`` attribute. - -You can use the ``level`` keyword to remove only a portion of the index: - -.. ipython:: python - - frame - frame.reset_index(level=1) - - -``reset_index`` takes an optional parameter ``drop`` which if true simply -discards the index, instead of putting index values in the DataFrame's columns. - -.. note:: - - The ``reset_index`` method used to be called ``delevel`` which is now - deprecated. - -Adding an ad hoc index -~~~~~~~~~~~~~~~~~~~~~~ - -If you create an index yourself, you can just assign it to the ``index`` field: - -.. code-block:: python - - data.index = index - -Indexing internal details -------------------------- - -.. note:: - - The following is largely relevant for those actually working on the pandas - codebase. The source code is still the best place to look at the specifics - of how things are implemented. - -In pandas there are a few objects implemented which can serve as valid -containers for the axis labels: - - - ``Index``: the generic "ordered set" object, an ndarray of object dtype - assuming nothing about its contents. The labels must be hashable (and - likely immutable) and unique. Populates a dict of label to location in - Cython to do :math:`O(1)` lookups. - - ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer - data, such as time stamps - - ``MultiIndex``: the standard hierarchical index object - - ``PeriodIndex``: An Index object with Period elements - - ``DatetimeIndex``: An Index object with Timestamp elements - - ``date_range``: fixed frequency date range generated from a time rule or - DateOffset. An ndarray of Python datetime objects - -The motivation for having an ``Index`` class in the first place was to enable -different implementations of indexing. This means that it's possible for you, -the user, to implement a custom ``Index`` subclass that may be better suited to -a particular application than the ones provided in pandas. - -From an internal implementation point of view, the relevant methods that an -``Index`` must define are one or more of the following (depending on how -incompatible the new object internals are with the ``Index`` functions): - - - ``get_loc``: returns an "indexer" (an integer, or in some cases a - slice object) for a label - - ``slice_locs``: returns the "range" to slice between two labels - - ``get_indexer``: Computes the indexing vector for reindexing / data - alignment purposes. See the source / docstrings for more on this - - ``get_indexer_non_unique``: Computes the indexing vector for reindexing / data - alignment purposes when the index is non-unique. See the source / docstrings - for more on this - - ``reindex``: Does any pre-conversion of the input index then calls - ``get_indexer`` - - ``union``, ``intersection``: computes the union or intersection of two - Index objects - - ``insert``: Inserts a new label into an Index, yielding a new object - - ``delete``: Delete a label, yielding a new object - - ``drop``: Deletes a set of labels - - ``take``: Analogous to ndarray.take diff --git a/doc/source/internals.rst b/doc/source/internals.rst new file mode 100644 index 0000000000000..e5d2b001c18f8 --- /dev/null +++ b/doc/source/internals.rst @@ -0,0 +1,96 @@ +.. _internals: + +.. currentmodule:: pandas + +.. ipython:: python + :suppress: + + import numpy as np + import random + np.random.seed(123456) + from pandas import * + options.display.max_rows=15 + import pandas as pd + randn = np.random.randn + randint = np.random.randint + np.set_printoptions(precision=4, suppress=True) + from pandas.compat import range, zip + +********* +Internals +********* + +This section will provide a look into some of pandas internals. + +Indexing +-------- + +In pandas there are a few objects implemented which can serve as valid +containers for the axis labels: + +- ``Index``: the generic "ordered set" object, an ndarray of object dtype + assuming nothing about its contents. The labels must be hashable (and + likely immutable) and unique. Populates a dict of label to location in + Cython to do ``O(1)`` lookups. +- ``Int64Index``: a version of ``Index`` highly optimized for 64-bit integer + data, such as time stamps +- ``Float64Index``: a version of ``Index`` highly optimized for 64-bit float data +- ``MultiIndex``: the standard hierarchical index object +- ``DatetimeIndex``: An Index object with Timestamp elements +- ``PeriodIndex``: An Index object with Period elements + +These are range generates to make the creation of a regular index easy: + +- ``date_range``: fixed frequency date range generated from a time rule or + DateOffset. An ndarray of Python datetime objects +- ``period_range``: fixed frequency date range generated from a time rule or + DateOffset. An ndarray of ``Period`` objects, representing Timespans + +The motivation for having an ``Index`` class in the first place was to enable +different implementations of indexing. This means that it's possible for you, +the user, to implement a custom ``Index`` subclass that may be better suited to +a particular application than the ones provided in pandas. + +From an internal implementation point of view, the relevant methods that an +``Index`` must define are one or more of the following (depending on how +incompatible the new object internals are with the ``Index`` functions): + +- ``get_loc``: returns an "indexer" (an integer, or in some cases a + slice object) for a label +- ``slice_locs``: returns the "range" to slice between two labels +- ``get_indexer``: Computes the indexing vector for reindexing / data + alignment purposes. See the source / docstrings for more on this +- ``get_indexer_non_unique``: Computes the indexing vector for reindexing / data + alignment purposes when the index is non-unique. See the source / docstrings + for more on this +- ``reindex``: Does any pre-conversion of the input index then calls + ``get_indexer`` +- ``union``, ``intersection``: computes the union or intersection of two + Index objects +- ``insert``: Inserts a new label into an Index, yielding a new object +- ``delete``: Delete a label, yielding a new object +- ``drop``: Deletes a set of labels +- ``take``: Analogous to ndarray.take + +MultiIndex +~~~~~~~~~~ + +Internally, the ``MultiIndex`` consists of a few things: the **levels**, the +integer **labels**, and the level **names**: + +.. ipython:: python + + index = MultiIndex.from_product([range(3), ['one', 'two']], names=['first', 'second']) + index + index.levels + index.labels + index.names + +You can probably guess that the labels determine which unique element is +identified with that location at each layer of the index. It's important to +note that sortedness is determined **solely** from the integer labels and does +not check (or care) whether the levels themselves are sorted. Fortunately, the +constructors ``from_tuples`` and ``from_arrays`` ensure that this is true, but +if you compute the levels and labels yourself, please be careful. + + diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 55bbf613b33cf..922fb84c57a56 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -90,7 +90,7 @@ this using the ``keys`` argument: concatenated As you can see (if you've read the rest of the documentation), the resulting -object's index has a :ref:`hierarchical index `. This +object's index has a :ref:`hierarchical index `. This means that we can now do stuff like select out each chunk by key: .. ipython:: python diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index 60342f1b6cba5..ddbfc60a5dfe7 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -77,7 +77,7 @@ this form, use the ``pivot`` function: If the ``values`` argument is omitted, and the input DataFrame has more than one column of values which are not used as column or index inputs to ``pivot``, then the resulting "pivoted" DataFrame will have :ref:`hierarchical columns -` whose topmost level indicates the respective value +` whose topmost level indicates the respective value column: .. ipython:: python @@ -103,7 +103,7 @@ Reshaping by stacking and unstacking Closely related to the ``pivot`` function are the related ``stack`` and ``unstack`` functions currently available on Series and DataFrame. These functions are designed to work together with ``MultiIndex`` objects (see the -section on :ref:`hierarchical indexing `). Here are +section on :ref:`hierarchical indexing `). Here are essentially what these functions do: - ``stack``: "pivot" a level of the (possibly hierarchical) column labels, diff --git a/doc/source/v0.11.0.txt b/doc/source/v0.11.0.txt index 3a56794151b1e..befdf848ad23b 100644 --- a/doc/source/v0.11.0.txt +++ b/doc/source/v0.11.0.txt @@ -50,8 +50,7 @@ three types of multi-axis indexing. is interpreted as position based or label based, it's usually better to be explicit and use ``.iloc`` or ``.loc``. - See more at :ref:`Advanced Indexing `, :ref:`Advanced Hierarchical ` and - :ref:`Fallback Indexing ` + See more at :ref:`Advanced Indexing ` and :ref:`Advanced Hierarchical `. Selection Deprecations diff --git a/doc/source/v0.14.0.txt b/doc/source/v0.14.0.txt index 96ab3d1e58d5c..e2f96f204edab 100644 --- a/doc/source/v0.14.0.txt +++ b/doc/source/v0.14.0.txt @@ -470,7 +470,7 @@ You can use ``slice(None)`` to select all the contents of *that* level. You do n As usual, **both sides** of the slicers are included as this is label indexing. -See :ref:`the docs` +See :ref:`the docs` See also issues (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :issue:`5641`, :issue:`7106`) .. warning:: diff --git a/doc/source/v0.15.0.txt b/doc/source/v0.15.0.txt index dd71ef1f63d54..5d514d71b30a5 100644 --- a/doc/source/v0.15.0.txt +++ b/doc/source/v0.15.0.txt @@ -20,6 +20,8 @@ users upgrade to this version. - New datetimelike properties accessor ``.dt`` for Series, see :ref:`Datetimelike Properties ` - dropping support for ``PyTables`` less than version 3.0.0, and ``numexpr`` less than version 2.1 (:issue:`7990`) - API change in using Indexes in set operations, see :ref:`here ` + - API change in using Indexs set operations, see :ref:`here ` + - Split indexing documentation into :ref:`Indexing and Selecing Data ` and :ref:`MultiIndex / Advanced Indexing ` - :ref:`Other Enhancements ` diff --git a/doc/source/v0.4.x.txt b/doc/source/v0.4.x.txt index 5333bb9ffb157..4717b46a6bca8 100644 --- a/doc/source/v0.4.x.txt +++ b/doc/source/v0.4.x.txt @@ -13,7 +13,7 @@ New Features Series (:issue:`209`, :issue:`203`) - :ref:`Added ` ``Series.align`` method for aligning two series with choice of join method (ENH56_) -- :ref:`Added ` method ``get_level_values`` to +- :ref:`Added ` method ``get_level_values`` to ``MultiIndex`` (:issue:`188`) - Set values in mixed-type ``DataFrame`` objects via ``.ix`` indexing attribute (:issue:`135`) - Added new ``DataFrame`` :ref:`methods ` @@ -28,7 +28,7 @@ New Features - ``DataFrame.rename`` has a new ``copy`` parameter to :ref:`rename ` a DataFrame in place (ENHed_) - :ref:`Enable ` unstacking by name (:issue:`142`) -- :ref:`Enable ` ``sortlevel`` to work by level (:issue:`141`) +- :ref:`Enable ` ``sortlevel`` to work by level (:issue:`141`) Performance Enhancements ~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/v0.5.0.txt b/doc/source/v0.5.0.txt index d0550fd5ef8f3..8b7e4721d136f 100644 --- a/doc/source/v0.5.0.txt +++ b/doc/source/v0.5.0.txt @@ -21,7 +21,7 @@ New Features - :ref:`Added` ``pivot_table`` convenience function to pandas namespace (:issue:`234`) - :ref:`Implemented ` ``Panel.rename_axis`` function (:issue:`243`) - DataFrame will show index level names in console output (:issue:`334`) -- :ref:`Implemented ` ``Panel.take`` +- :ref:`Implemented ` ``Panel.take`` - :ref:`Added` ``set_eng_float_format`` for alternate DataFrame floating point string formatting (ENH61_) - :ref:`Added ` convenience ``set_index`` function for creating a DataFrame index from its existing columns - :ref:`Implemented ` ``groupby`` hierarchical index level name (:issue:`223`) diff --git a/doc/source/v0.6.1.txt b/doc/source/v0.6.1.txt index 7e593d07f7f2b..a2dab738546f9 100644 --- a/doc/source/v0.6.1.txt +++ b/doc/source/v0.6.1.txt @@ -32,7 +32,7 @@ New features - Add ``Series.from_csv`` function (:issue:`482`) - :ref:`Can pass ` DataFrame/DataFrame and DataFrame/Series to rolling_corr/rolling_cov (GH #462) -- MultiIndex.get_level_values can :ref:`accept the level name ` +- MultiIndex.get_level_values can :ref:`accept the level name ` Performance improvements ~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/v0.7.0.txt b/doc/source/v0.7.0.txt index bf7acd3820db0..cfba2ad3d05b6 100644 --- a/doc/source/v0.7.0.txt +++ b/doc/source/v0.7.0.txt @@ -33,7 +33,7 @@ New features df = DataFrame(randn(10, 4)) df.apply(lambda x: x.describe()) -- :ref:`Add` ``reorder_levels`` method to Series and +- :ref:`Add` ``reorder_levels`` method to Series and DataFrame (:issue:`534`) - :ref:`Add` dict-like ``get`` function to DataFrame @@ -50,7 +50,7 @@ New features - :ref:`Add ` ``level`` option to binary arithmetic functions on ``DataFrame`` and ``Series`` -- :ref:`Add ` ``level`` option to the ``reindex`` +- :ref:`Add ` ``level`` option to the ``reindex`` and ``align`` methods on Series and DataFrame for broadcasting values across a level (:issue:`542`, :issue:`552`, others) @@ -103,7 +103,7 @@ New features - :ref:`Added ` ``isin`` method to index objects -- :ref:`Added ` ``level`` argument to ``xs`` method of DataFrame. +- :ref:`Added ` ``level`` argument to ``xs`` method of DataFrame. API Changes to integer indexing diff --git a/pandas/core/generic.py b/pandas/core/generic.py index 3a75f145587c0..dc89bdd8c9130 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -946,7 +946,7 @@ def to_sql(self, name, con, flavor='sqlite', schema=None, if_exists='fail', `index` is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex. chunksize : int, default None - If not None, then rows will be written in batches of this size at a + If not None, then rows will be written in batches of this size at a time. If None, all rows will be written at once. """ @@ -1383,7 +1383,7 @@ def xs(self, key, axis=0, level=None, copy=None, drop_level=True): xs is only for getting, not setting values. MultiIndex Slicers is a generic way to get/set values on any level or levels - it is a superset of xs functionality, see :ref:`MultiIndex Slicers ` + it is a superset of xs functionality, see :ref:`MultiIndex Slicers ` """ if copy is not None: diff --git a/pandas/core/panel.py b/pandas/core/panel.py index 03de19afe0580..95d279add172c 100644 --- a/pandas/core/panel.py +++ b/pandas/core/panel.py @@ -711,7 +711,7 @@ def major_xs(self, key, copy=None): major_xs is only for getting, not setting values. MultiIndex Slicers is a generic way to get/set values on any level or levels - it is a superset of major_xs functionality, see :ref:`MultiIndex Slicers ` + it is a superset of major_xs functionality, see :ref:`MultiIndex Slicers ` """ if copy is not None: @@ -741,7 +741,7 @@ def minor_xs(self, key, copy=None): minor_xs is only for getting, not setting values. MultiIndex Slicers is a generic way to get/set values on any level or levels - it is a superset of minor_xs functionality, see :ref:`MultiIndex Slicers ` + it is a superset of minor_xs functionality, see :ref:`MultiIndex Slicers ` """ if copy is not None: @@ -771,7 +771,7 @@ def xs(self, key, axis=1, copy=None): xs is only for getting, not setting values. MultiIndex Slicers is a generic way to get/set values on any level or levels - it is a superset of xs functionality, see :ref:`MultiIndex Slicers ` + it is a superset of xs functionality, see :ref:`MultiIndex Slicers ` """ if copy is not None: