DOC: v0.15.0 / faq.rst updates

jreback · jreback · commit 65d716ce0ae9 · 2014-10-05T10:46:55.000-04:00
diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst
@@ -294,7 +294,7 @@ This is even true for strings and numeric data:
     s
     s.sort()
     s
-    print(s.min(), s.max())
+    s.min(), s.max()
 
 Reordering the categories is possible via the :func:`Categorical.reorder_categories` and
 the :func:`Categorical.set_categories` methods. For :func:`Categorical.reorder_categories`, all
@@ -307,7 +307,7 @@ old categories must be included in the new categories and no new categories are
     s
     s.sort()
     s
-    print(s.min(), s.max())
+    s.min(), s.max()
 
 .. note::
 
diff --git a/doc/source/faq.rst b/doc/source/faq.rst
@@ -33,26 +33,25 @@ As of pandas version 0.15.0, the memory usage of a dataframe (including
 the index) is shown when accessing the ``info`` method of a dataframe. A
 configuration option, ``display.memory_usage`` (see :ref:`options`),
 specifies if the dataframe's memory usage will be displayed when
-invoking the df.info() method.
+invoking the ``df.info()`` method.
 
 For example, the memory usage of the dataframe below is shown
-when calling df.info():
+when calling ``df.info()``:
 
 .. ipython:: python
 
     dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]',
-                'complex128', 'object', 'bool']
+               'complex128', 'object', 'bool']
     n = 5000
     data = dict([ (t, np.random.randint(100, size=n).astype(t))
                     for t in dtypes])
     df = DataFrame(data)
+    df['categorical'] = df['object'].astype('category')
 
     df.info()
 
-By default the display option is set to True but can be explicitly
-overridden by passing the memory_usage argument when invoking df.info().
-Note that ``memory_usage=None`` is the default value for the  df.info()
-method and follows the setting specified by display.memory_usage.
+By default the display option is set to ``True`` but can be explicitly
+overridden by passing the ``memory_usage`` argument when invoking ``df.info()``.
 
 The memory usage of each column can be found by calling the ``memory_usage``
 method. This returns a Series with an index represented by column names
@@ -80,24 +79,7 @@ The memory usage displayed by the ``info`` method utilizes the
 while also formatting the output in human-readable units (base-2
 representation; i.e., 1KB = 1024 bytes).
 
-Pandas version 0.15.0 introduces a new categorical data type (see
-:ref:`categorical`), which can be used in Series and DataFrames.
-Significant memory savings can be achieved when using the category
-datatype. This is demonstrated below:
-
-.. ipython:: python
-
-  df['bases_object'] = Series(np.array(['adenine', 'cytosine', 'guanine', 'thymine']).take(np.random.randint(0,4,size=len(df))))
-
-  df['bases_categorical'] = df['bases_object'].astype('category')
-
-  df.memory_usage()
-
-While the *base_object* and *bases_categorical* appear as identical
-columns in the dataframe, the memory savings of the categorical
-datatype, versus the object datatype, is revealed by ``memory_usage``.
-
-
+See also :ref:`Categorical Memory Usage <categorical.memory>`.
 
 .. _ref-monkey-patching:
 
diff --git a/doc/source/v0.15.0.txt b/doc/source/v0.15.0.txt
@@ -17,6 +17,7 @@ users upgrade to this version.
 
   - The ``Categorical`` type was integrated as a first-class pandas type, see :ref:`here <whatsnew_0150.cat>`
   - New scalar type ``Timedelta``, and a new index type ``TimedeltaIndex``, see :ref:`here <whatsnew_0150.timedeltaindex>`
+  - New DataFrame default display for ``df.info()`` to include memory usage, see :ref:`Memory Usage <whatsnew_0150.memory>`
   - New datetimelike properties accessor ``.dt`` for Series, see :ref:`Datetimelike Properties <whatsnew_0150.dt>`
   - Split indexing documentation into :ref:`Indexing and Selecting Data <indexing>` and :ref:`MultiIndex / Advanced Indexing <advanced>`
   - Split out string methods documentation into :ref:`Working with Text Data <text>`
@@ -57,7 +58,7 @@ users upgrade to this version.
 API changes
 ~~~~~~~~~~~
 
-- Passing multiple levels to `DataFrame.stack()` will now work when multiple level
+- Passing multiple levels to :meth:`~pandas.DataFrame.stack()` will now work when multiple level
   numbers are passed (:issue:`7660`), and will raise a ``ValueError`` when the
   levels aren't all level names or all level numbers. See
   :ref:`Reshaping by stacking and unstacking <reshaping.stack_multiple>`.
@@ -134,7 +135,7 @@ API changes
 
   .. code-block:: python
 
-     In [1]: idx = pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']])
+     In [1]: idx = MultiIndex.from_product([[0, 1], ['a', 'b', 'c']])
 
      In [2]: idx.values
      Out[2]: array([(0, 'a'), (0, 'b'), (0, 'c'), (1, 'a'), (1, 'b'), (1, 'c')], dtype=object)
@@ -199,7 +200,10 @@ API changes
                                               names=['one','two'])
                ).sortlevel()
      s
-     s.loc[['D']]
+     try:
+        s.loc[['D']]
+     except KeyError as e:
+        print("KeyError: " + str(e))
 
 - ``Index`` now supports ``duplicated`` and ``drop_duplicates``. (:issue:`4060`)
 
@@ -239,7 +243,8 @@ API changes
 
   .. ipython:: python
 
-     df = DataFrame([[True, 1],[False, 2]], columns = ["female","fitness"])
+     df = DataFrame([[True, 1],[False, 2]],
+                    columns=["female","fitness"])
      df
      df.dtypes
 
@@ -259,15 +264,32 @@ API changes
 
 - ``DataFrame.plot`` and ``Series.plot`` keywords are now have consistent orders (:issue:`8037`)
 
-- Implements methods to find memory usage of a DataFrame (:issue:`6852`). A new display option ``display.memory_usage`` (see :ref:`options`) sets the default behavior of the ``memory_usage`` argument in the ``df.info()`` method; by default ``display.memory_usage`` is True but this can be overridden by explicitly passing the memory_usage argument to the df.info() method, as shown below. Additionally `memory_usage` is an available method for a dataframe object which returns the memory usage of each column (for more information see :ref:`df-memory-usage`):
+.. _whatsnew_0150.memory:
 
-  .. ipython:: python
+Memory Usage
+~~~~~~~~~~~~~
+
+Implemented methods to find memory usage of a DataFrame. See the :ref:`FAQ <df-memory-usage>` for more. (:issue:`6852`).
 
-     df = DataFrame({ 'float' : np.random.randn(1000), 'int' : np.random.randint(0,5,size=1000)})
-     df.memory_usage()
+A new display option ``display.memory_usage`` (see :ref:`options`) sets the default behavior of the ``memory_usage`` argument in the ``df.info()`` method. By default ``display.memory_usage`` is ``True``.
+
+.. ipython:: python
 
-     df.info(memory_usage=True)
+    dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]',
+              'complex128', 'object', 'bool']
+    n = 5000
+    data = dict([ (t, np.random.randint(100, size=n).astype(t))
+                    for t in dtypes])
+    df = DataFrame(data)
+    df['categorical'] = df['object'].astype('category')
 
+    df.info()
+
+Additionally :meth:`~pandas.DataFrame.memory_usage` is an available method for a dataframe object which returns the memory usage of each column.
+
+.. ipython:: python
+
+    df.memory_usage(index=True)
 
 .. _whatsnew_0150.dt:
 
@@ -582,7 +604,7 @@ For full docs, see the :ref:`categorical introduction <categorical>` and the
 
 .. ipython:: python
 
-    df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
+    df = DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
 
     df["grade"] = df["raw_grade"].astype("category")
     df["grade"]
@@ -719,7 +741,8 @@ Finally, the combination of ``TimedeltaIndex`` with ``DatetimeIndex`` allow cert
 Prior Version Deprecations/Changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-There are no prior version deprecations that are taking effect as of 0.15.0.
+- Remove ``DataFrame.delevel`` method in favor of ``DataFrame.reset_index``
+  (:issue:`420`)
 
 .. _whatsnew_0150.deprecations:
 
@@ -732,8 +755,7 @@ Deprecations
   ``ambiguous`` to allow for more flexibility in dealing with DST transitions.
   Replace ``infer_dst=True`` with ``ambiguous='infer'`` for the same behavior (:issue:`7943`).
   See :ref:`the docs<timeseries.timezone_ambiguous>` for more details.
-- Remove ``DataFrame.delevel`` method in favor of ``DataFrame.reset_index``
-  (:issue:`420`)
+
 .. _whatsnew_0150.index_set_ops:
 
 - The ``Index`` set operations ``+`` and ``-`` were deprecated in order to provide these for numeric type operations on certain index types. ``+`` can be replace by ``.union()`` or ``|``, and ``-`` by ``.difference()``. Further the method name ``Index.diff()`` is deprecated and can be replaced by ``Index.difference()`` (:issue:`8226`)
@@ -816,8 +838,8 @@ Enhancements
 
   .. ipython:: python
 
-    df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
-                       'C': [1, 2, 3]})
+    df = DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
+                    'C': [1, 2, 3]})
     pd.get_dummies(df)
 
 
diff --git a/doc/source/whatsnew.rst b/doc/source/whatsnew.rst
@@ -7,6 +7,7 @@
 
    import numpy as np
    from pandas import *
+   import pandas as pd
    randn = np.random.randn
    np.set_printoptions(precision=4, suppress=True)
    options.display.max_rows = 15

Original file line number	Diff line number	Diff line change
`@@ -294,7 +294,7 @@ This is even true for strings and numeric data:`
`294`	`294`	`s`
`295`	`295`	`s.sort()`
`296`	`296`	`s`
`297`		`- print(s.min(), s.max())`
	`297`	`+ s.min(), s.max()`
`298`	`298`
`299`	`299`	Reordering the categories is possible via the :func:`Categorical.reorder_categories` and
`300`	`300`	the :func:`Categorical.set_categories` methods. For :func:`Categorical.reorder_categories`, all
`@@ -307,7 +307,7 @@ old categories must be included in the new categories and no new categories are`
`307`	`307`	`s`
`308`	`308`	`s.sort()`
`309`	`309`	`s`
`310`		`- print(s.min(), s.max())`
	`310`	`+ s.min(), s.max()`
`311`	`311`
`312`	`312`	`.. note::`
`313`	`313`