Skip to content

Commit 65d716c

Browse files
committed
DOC: v0.15.0 / faq.rst updates
1 parent d29d4c6 commit 65d716c

File tree

4 files changed

+47
-42
lines changed

4 files changed

+47
-42
lines changed

doc/source/categorical.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ This is even true for strings and numeric data:
294294
s
295295
s.sort()
296296
s
297-
print(s.min(), s.max())
297+
s.min(), s.max()
298298
299299
Reordering the categories is possible via the :func:`Categorical.reorder_categories` and
300300
the :func:`Categorical.set_categories` methods. For :func:`Categorical.reorder_categories`, all
@@ -307,7 +307,7 @@ old categories must be included in the new categories and no new categories are
307307
s
308308
s.sort()
309309
s
310-
print(s.min(), s.max())
310+
s.min(), s.max()
311311
312312
.. note::
313313

doc/source/faq.rst

Lines changed: 7 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -33,26 +33,25 @@ As of pandas version 0.15.0, the memory usage of a dataframe (including
3333
the index) is shown when accessing the ``info`` method of a dataframe. A
3434
configuration option, ``display.memory_usage`` (see :ref:`options`),
3535
specifies if the dataframe's memory usage will be displayed when
36-
invoking the df.info() method.
36+
invoking the ``df.info()`` method.
3737

3838
For example, the memory usage of the dataframe below is shown
39-
when calling df.info():
39+
when calling ``df.info()``:
4040

4141
.. ipython:: python
4242
4343
dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]',
44-
'complex128', 'object', 'bool']
44+
'complex128', 'object', 'bool']
4545
n = 5000
4646
data = dict([ (t, np.random.randint(100, size=n).astype(t))
4747
for t in dtypes])
4848
df = DataFrame(data)
49+
df['categorical'] = df['object'].astype('category')
4950
5051
df.info()
5152
52-
By default the display option is set to True but can be explicitly
53-
overridden by passing the memory_usage argument when invoking df.info().
54-
Note that ``memory_usage=None`` is the default value for the df.info()
55-
method and follows the setting specified by display.memory_usage.
53+
By default the display option is set to ``True`` but can be explicitly
54+
overridden by passing the ``memory_usage`` argument when invoking ``df.info()``.
5655

5756
The memory usage of each column can be found by calling the ``memory_usage``
5857
method. This returns a Series with an index represented by column names
@@ -80,24 +79,7 @@ The memory usage displayed by the ``info`` method utilizes the
8079
while also formatting the output in human-readable units (base-2
8180
representation; i.e., 1KB = 1024 bytes).
8281

83-
Pandas version 0.15.0 introduces a new categorical data type (see
84-
:ref:`categorical`), which can be used in Series and DataFrames.
85-
Significant memory savings can be achieved when using the category
86-
datatype. This is demonstrated below:
87-
88-
.. ipython:: python
89-
90-
df['bases_object'] = Series(np.array(['adenine', 'cytosine', 'guanine', 'thymine']).take(np.random.randint(0,4,size=len(df))))
91-
92-
df['bases_categorical'] = df['bases_object'].astype('category')
93-
94-
df.memory_usage()
95-
96-
While the *base_object* and *bases_categorical* appear as identical
97-
columns in the dataframe, the memory savings of the categorical
98-
datatype, versus the object datatype, is revealed by ``memory_usage``.
99-
100-
82+
See also :ref:`Categorical Memory Usage <categorical.memory>`.
10183

10284
.. _ref-monkey-patching:
10385

doc/source/v0.15.0.txt

Lines changed: 37 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ users upgrade to this version.
1717

1818
- The ``Categorical`` type was integrated as a first-class pandas type, see :ref:`here <whatsnew_0150.cat>`
1919
- New scalar type ``Timedelta``, and a new index type ``TimedeltaIndex``, see :ref:`here <whatsnew_0150.timedeltaindex>`
20+
- New DataFrame default display for ``df.info()`` to include memory usage, see :ref:`Memory Usage <whatsnew_0150.memory>`
2021
- New datetimelike properties accessor ``.dt`` for Series, see :ref:`Datetimelike Properties <whatsnew_0150.dt>`
2122
- Split indexing documentation into :ref:`Indexing and Selecting Data <indexing>` and :ref:`MultiIndex / Advanced Indexing <advanced>`
2223
- Split out string methods documentation into :ref:`Working with Text Data <text>`
@@ -57,7 +58,7 @@ users upgrade to this version.
5758
API changes
5859
~~~~~~~~~~~
5960

60-
- Passing multiple levels to `DataFrame.stack()` will now work when multiple level
61+
- Passing multiple levels to :meth:`~pandas.DataFrame.stack()` will now work when multiple level
6162
numbers are passed (:issue:`7660`), and will raise a ``ValueError`` when the
6263
levels aren't all level names or all level numbers. See
6364
:ref:`Reshaping by stacking and unstacking <reshaping.stack_multiple>`.
@@ -134,7 +135,7 @@ API changes
134135

135136
.. code-block:: python
136137

137-
In [1]: idx = pd.MultiIndex.from_product([[0, 1], ['a', 'b', 'c']])
138+
In [1]: idx = MultiIndex.from_product([[0, 1], ['a', 'b', 'c']])
138139

139140
In [2]: idx.values
140141
Out[2]: array([(0, 'a'), (0, 'b'), (0, 'c'), (1, 'a'), (1, 'b'), (1, 'c')], dtype=object)
@@ -199,7 +200,10 @@ API changes
199200
names=['one','two'])
200201
).sortlevel()
201202
s
202-
s.loc[['D']]
203+
try:
204+
s.loc[['D']]
205+
except KeyError as e:
206+
print("KeyError: " + str(e))
203207

204208
- ``Index`` now supports ``duplicated`` and ``drop_duplicates``. (:issue:`4060`)
205209

@@ -239,7 +243,8 @@ API changes
239243

240244
.. ipython:: python
241245

242-
df = DataFrame([[True, 1],[False, 2]], columns = ["female","fitness"])
246+
df = DataFrame([[True, 1],[False, 2]],
247+
columns=["female","fitness"])
243248
df
244249
df.dtypes
245250

@@ -259,15 +264,32 @@ API changes
259264

260265
- ``DataFrame.plot`` and ``Series.plot`` keywords are now have consistent orders (:issue:`8037`)
261266

262-
- Implements methods to find memory usage of a DataFrame (:issue:`6852`). A new display option ``display.memory_usage`` (see :ref:`options`) sets the default behavior of the ``memory_usage`` argument in the ``df.info()`` method; by default ``display.memory_usage`` is True but this can be overridden by explicitly passing the memory_usage argument to the df.info() method, as shown below. Additionally `memory_usage` is an available method for a dataframe object which returns the memory usage of each column (for more information see :ref:`df-memory-usage`):
267+
.. _whatsnew_0150.memory:
263268

264-
.. ipython:: python
269+
Memory Usage
270+
~~~~~~~~~~~~~
271+
272+
Implemented methods to find memory usage of a DataFrame. See the :ref:`FAQ <df-memory-usage>` for more. (:issue:`6852`).
265273

266-
df = DataFrame({ 'float' : np.random.randn(1000), 'int' : np.random.randint(0,5,size=1000)})
267-
df.memory_usage()
274+
A new display option ``display.memory_usage`` (see :ref:`options`) sets the default behavior of the ``memory_usage`` argument in the ``df.info()`` method. By default ``display.memory_usage`` is ``True``.
275+
276+
.. ipython:: python
268277

269-
df.info(memory_usage=True)
278+
dtypes = ['int64', 'float64', 'datetime64[ns]', 'timedelta64[ns]',
279+
'complex128', 'object', 'bool']
280+
n = 5000
281+
data = dict([ (t, np.random.randint(100, size=n).astype(t))
282+
for t in dtypes])
283+
df = DataFrame(data)
284+
df['categorical'] = df['object'].astype('category')
270285

286+
df.info()
287+
288+
Additionally :meth:`~pandas.DataFrame.memory_usage` is an available method for a dataframe object which returns the memory usage of each column.
289+
290+
.. ipython:: python
291+
292+
df.memory_usage(index=True)
271293

272294
.. _whatsnew_0150.dt:
273295

@@ -582,7 +604,7 @@ For full docs, see the :ref:`categorical introduction <categorical>` and the
582604

583605
.. ipython:: python
584606

585-
df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
607+
df = DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
586608

587609
df["grade"] = df["raw_grade"].astype("category")
588610
df["grade"]
@@ -719,7 +741,8 @@ Finally, the combination of ``TimedeltaIndex`` with ``DatetimeIndex`` allow cert
719741
Prior Version Deprecations/Changes
720742
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
721743

722-
There are no prior version deprecations that are taking effect as of 0.15.0.
744+
- Remove ``DataFrame.delevel`` method in favor of ``DataFrame.reset_index``
745+
(:issue:`420`)
723746

724747
.. _whatsnew_0150.deprecations:
725748

@@ -732,8 +755,7 @@ Deprecations
732755
``ambiguous`` to allow for more flexibility in dealing with DST transitions.
733756
Replace ``infer_dst=True`` with ``ambiguous='infer'`` for the same behavior (:issue:`7943`).
734757
See :ref:`the docs<timeseries.timezone_ambiguous>` for more details.
735-
- Remove ``DataFrame.delevel`` method in favor of ``DataFrame.reset_index``
736-
(:issue:`420`)
758+
737759
.. _whatsnew_0150.index_set_ops:
738760

739761
- The ``Index`` set operations ``+`` and ``-`` were deprecated in order to provide these for numeric type operations on certain index types. ``+`` can be replace by ``.union()`` or ``|``, and ``-`` by ``.difference()``. Further the method name ``Index.diff()`` is deprecated and can be replaced by ``Index.difference()`` (:issue:`8226`)
@@ -816,8 +838,8 @@ Enhancements
816838

817839
.. ipython:: python
818840

819-
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
820-
'C': [1, 2, 3]})
841+
df = DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'],
842+
'C': [1, 2, 3]})
821843
pd.get_dummies(df)
822844

823845

doc/source/whatsnew.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
88
import numpy as np
99
from pandas import *
10+
import pandas as pd
1011
randn = np.random.randn
1112
np.set_printoptions(precision=4, suppress=True)
1213
options.display.max_rows = 15

0 commit comments

Comments
 (0)