From 174ecf84d16cfe34f400f11c7a1f2515cea51349 Mon Sep 17 00:00:00 2001
From: unutbu <unutbu@gmail.com>
Date: Fri, 24 Jan 2014 18:22:49 -0500
Subject: [PATCH] DOC: Explain the use of NDFrame.equals

---
 doc/source/basics.rst  | 57 +++++++++++++++++++++++++++++-------------
 doc/source/v0.13.1.txt |  2 +-
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/doc/source/basics.rst b/doc/source/basics.rst
index e9cc03c098d03..9521bae373060 100644
--- a/doc/source/basics.rst
+++ b/doc/source/basics.rst
@@ -215,14 +215,6 @@ These operations produce a pandas object the same type as the left-hand-side inp
 that if of dtype ``bool``. These ``boolean`` objects can be used in indexing operations,
 see :ref:`here<indexing.boolean>`
 
-As of v0.13.1, Series, DataFrames and Panels have an equals method to compare if
-two such objects are equal.
-
-.. ipython:: python
-
-   df.equals(df)
-   df.equals(df2)
-
 .. _basics.reductions:
 
 Boolean Reductions
@@ -281,6 +273,35 @@ To evaluate single-element pandas objects in a boolean context, use the method `
 
 See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
 
+.. _basics.equals:
+
+Often you may find there is more than one way to compute the same
+result.  As a simple example, consider ``df+df`` and ``df*2``. To test
+that these two computations produce the same result, given the tools
+shown above, you might imagine using ``(df+df == df*2).all()``. But in
+fact, this expression is False:
+
+.. ipython:: python
+
+   df+df == df*2
+   (df+df == df*2).all()
+
+Notice that the boolean DataFrame ``df+df == df*2`` contains some False values!
+That is because NaNs do not compare as equals: 
+
+.. ipython:: python
+
+   np.nan == np.nan
+
+So, as of v0.13.1, NDFrames (such as Series, DataFrames, and Panels)
+have an ``equals`` method for testing equality, with NaNs in corresponding
+locations treated as equal.
+
+.. ipython:: python
+
+   (df+df).equals(df*2)
+
+
 
 Combining overlapping data sets
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -497,7 +518,7 @@ of a 1D array of values. It can also be used as a function on regular arrays:
    s.value_counts()
    value_counts(data)
 
-Similarly, you can get the most frequently occuring value(s) (the mode) of the values in a Series or DataFrame:
+Similarly, you can get the most frequently occurring value(s) (the mode) of the values in a Series or DataFrame:
 
 .. ipython:: python
 
@@ -783,7 +804,7 @@ DataFrame's index.
     pre-aligned data**. Adding two unaligned DataFrames internally triggers a
     reindexing step. For exploratory analysis you will hardly notice the
     difference (because ``reindex`` has been heavily optimized), but when CPU
-    cycles matter sprinking a few explicit ``reindex`` calls here and there can
+    cycles matter sprinkling a few explicit ``reindex`` calls here and there can
     have an impact.
 
 .. _basics.reindex_like:
@@ -1013,7 +1034,7 @@ containing the data in each row:
       ...:     print('%s\n%s' % (row_index, row))
       ...:
 
-For instance, a contrived way to transpose the dataframe would be:
+For instance, a contrived way to transpose the DataFrame would be:
 
 .. ipython:: python
 
@@ -1160,12 +1181,12 @@ relies on strict ``re.match``, while ``contains`` relies on ``re.search``.
 
    This old, deprecated behavior of ``match`` is still the default. As
    demonstrated above, use the new behavior by setting ``as_indexer=True``.
-   In this mode, ``match`` is analagous to ``contains``, returning a boolean
+   In this mode, ``match`` is analogous to ``contains``, returning a boolean
    Series. The new behavior will become the default behavior in a future
    release.
 
 Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
- an extra ``na`` arguement so missing values can be considered True or False:
+ an extra ``na`` argument so missing values can be considered True or False:
 
 .. ipython:: python
 
@@ -1189,7 +1210,7 @@ Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
     ``slice_replace``,Replace slice in each string with passed value
     ``count``,Count occurrences of pattern
     ``startswith``,Equivalent to ``str.startswith(pat)`` for each element
-    ``endswidth``,Equivalent to ``str.endswith(pat)`` for each element
+    ``endswith``,Equivalent to ``str.endswith(pat)`` for each element
     ``findall``,Compute list of all occurrences of pattern/regex for each string
     ``match``,"Call ``re.match`` on each element, returning matched groups as list"
     ``extract``,"Call ``re.match`` on each element, as ``match`` does, but return matched groups as strings for convenience."
@@ -1364,7 +1385,7 @@ from the current type (say ``int`` to ``float``)
    df3.dtypes
 
 The ``values`` attribute on a DataFrame return the *lower-common-denominator* of the dtypes, meaning
-the dtype that can accomodate **ALL** of the types in the resulting homogenous dtyped numpy array. This can
+the dtype that can accommodate **ALL** of the types in the resulting homogenous dtyped numpy array. This can
 force some *upcasting*.
 
 .. ipython:: python
@@ -1376,7 +1397,7 @@ astype
 
 .. _basics.cast:
 
-You can use the ``astype`` method to explicity convert dtypes from one to another. These will by default return a copy,
+You can use the ``astype`` method to explicitly convert dtypes from one to another. These will by default return a copy,
 even if the dtype was unchanged (pass ``copy=False`` to change this behavior). In addition, they will raise an
 exception if the astype operation is invalid.
 
@@ -1411,7 +1432,7 @@ they will be set to ``np.nan``.
    df3.dtypes
 
 To force conversion to ``datetime64[ns]``, pass ``convert_dates='coerce'``.
-This will convert any datetimelike object to dates, forcing other values to ``NaT``.
+This will convert any datetime-like object to dates, forcing other values to ``NaT``.
 This might be useful if you are reading in data which is mostly dates,
 but occasionally has non-dates intermixed and you want to represent as missing.
 
@@ -1598,7 +1619,7 @@ For instance:
 
 
 The ``set_printoptions`` function has a number of options for controlling how
-floating point numbers are formatted (using hte ``precision`` argument) in the
+floating point numbers are formatted (using the ``precision`` argument) in the
 console and . The ``max_rows`` and ``max_columns`` control how many rows and
 columns of DataFrame objects are shown by default. If ``max_columns`` is set to
 0 (the default, in fact), the library will attempt to fit the DataFrame's
diff --git a/doc/source/v0.13.1.txt b/doc/source/v0.13.1.txt
index 55599bb47cd8e..ef9df31b9f99d 100644
--- a/doc/source/v0.13.1.txt
+++ b/doc/source/v0.13.1.txt
@@ -46,7 +46,7 @@ API changes
   equal have equal axes, dtypes, and values. Added the
   ``array_equivalent`` function to compare if two ndarrays are
   equal. NaNs in identical locations are treated as
-  equal. (:issue:`5283`)
+  equal. (:issue:`5283`) See also :ref:`the docs<basics.equals>` for a motivating example.
 
   .. ipython:: python