From e2b9d924c8dccb35c3ba5fcbc23fa6c5f8950ad5 Mon Sep 17 00:00:00 2001 From: tommyod Date: Mon, 1 Jan 2018 22:14:54 +0100 Subject: [PATCH 1/5] Spellchecked merging.rst, and most of reshaping.rst2 --- doc/source/merging.rst | 218 +++++++++++++++++++++------------------ doc/source/reshaping.rst | 106 ++++++++++--------- 2 files changed, 174 insertions(+), 150 deletions(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 5f2e90e6ae4fe..ebade853313ab 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -31,11 +31,11 @@ operations. Concatenating objects --------------------- -The ``concat`` function (in the main pandas namespace) does all of the heavy -lifting of performing concatenation operations along an axis while performing -optional set logic (union or intersection) of the indexes (if any) on the other -axes. Note that I say "if any" because there is only a single possible axis of -concatenation for Series. +The :func:`~pandas.concat` function (in the main pandas namespace) does all of +the heavy lifting of performing concatenation operations along an axis while +performing optional set logic (union or intersection) of the indexes (if any) on +the other axes. Note that I say "if any" because there is only a single possible +axis of concatenation for Series. Before diving into all of the details of ``concat`` and what it can do, here is a simple example: @@ -109,10 +109,10 @@ some configurable handling of "what to do with the other axes": to the actual data concatenation. - ``copy`` : boolean, default True. If False, do not copy data unnecessarily. -Without a little bit of context and example many of these arguments don't make -much sense. Let's take the above example. Suppose we wanted to associate -specific keys with each of the pieces of the chopped up DataFrame. We can do -this using the ``keys`` argument: +Without a little bit of context many of these arguments don't make much sense. +Let's revisit the above example. Suppose we wanted to associate specific keys +with each of the pieces of the chopped up DataFrame. We can do this using the +``keys`` argument: .. ipython:: python @@ -128,7 +128,7 @@ this using the ``keys`` argument: As you can see (if you've read the rest of the documentation), the resulting object's index has a :ref:`hierarchical index `. This -means that we can now do stuff like select out each chunk by key: +means that we can now select out each chunk by key: .. ipython:: python @@ -138,10 +138,10 @@ It's not a stretch to see how this can be very useful. More detail on this functionality below. .. note:: - It is worth noting however, that ``concat`` (and therefore ``append``) makes - a full copy of the data, and that constantly reusing this function can - create a significant performance hit. If you need to use the operation over - several datasets, use a list comprehension. + It is worth noting that :func:`~pandas.concat` (and therefore + :func:`~pandas.append`) makes a full copy of the data, and that constantly + reusing this function can create a significant performance hit. If you need + to use the operation over several datasets, use a list comprehension. :: @@ -152,17 +152,16 @@ functionality below. Set logic on the other axes ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -When gluing together multiple DataFrames (or Panels or...), for example, you -have a choice of how to handle the other axes (other than the one being -concatenated). This can be done in three ways: +When gluing together multiple ``DataFrame``s, you have a choice of how to handle +the other axes (other than the one being concatenated). This can be done in +the following three ways: - Take the (sorted) union of them all, ``join='outer'``. This is the default option as it results in zero information loss. - Take the intersection, ``join='inner'``. -- Use a specific index (in the case of DataFrame) or indexes (in the case of - Panel or future higher dimensional objects), i.e. the ``join_axes`` argument +- Use a specific index, as passed to the ``join_axes`` argument. -Here is a example of each of these methods. First, the default ``join='outer'`` +Here is an example of each of these methods. First, the default ``join='outer'`` behavior: .. ipython:: python @@ -217,9 +216,9 @@ DataFrame: Concatenating using ``append`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A useful shortcut to ``concat`` are the ``append`` instance methods on Series -and DataFrame. These methods actually predated ``concat``. They concatenate -along ``axis=0``, namely the index: +A useful shortcut to :func:`~pandas.concat` are the :meth:`~DataFrame.append` +instance methods on ``Series`` and ``DataFrame``. These methods actually predated +``concat``. They concatenate along ``axis=0``, namely the index: .. ipython:: python @@ -233,7 +232,7 @@ along ``axis=0``, namely the index: labels=['df1', 'df2'], vertical=True); plt.close('all'); -In the case of DataFrame, the indexes must be disjoint but the columns do not +In the case of ``DataFrame``, the indexes must be disjoint but the columns do not need to be: .. ipython:: python @@ -264,18 +263,17 @@ need to be: .. note:: - Unlike `list.append` method, which appends to the original list and - returns nothing, ``append`` here **does not** modify ``df1`` and - returns its copy with ``df2`` appended. + Unlike the :py:meth:`~list.append` method, which appends to the original list + and returns ``None``, :meth:`~DataFrame.append` here **does not** modify + ``df1`` and returns its copy with ``df2`` appended. .. _merging.ignore_index: Ignoring indexes on the concatenation axis ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -For DataFrames which don't have a meaningful index, you may wish to append them -and ignore the fact that they may have overlapping indexes: - -To do this, use the ``ignore_index`` argument: +For ``DataFrame``s which don't have a meaningful index, you may wish to append +them and ignore the fact that they may have overlapping indexes. To do this, use +the ``ignore_index`` argument: .. ipython:: python @@ -289,7 +287,7 @@ To do this, use the ``ignore_index`` argument: labels=['df1', 'df4'], vertical=True); plt.close('all'); -This is also a valid argument to ``DataFrame.append``: +This is also a valid argument to :meth:`DataFrame.append`: .. ipython:: python @@ -308,9 +306,9 @@ This is also a valid argument to ``DataFrame.append``: Concatenating with mixed ndims ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can concatenate a mix of Series and DataFrames. The -Series will be transformed to DataFrames with the column name as -the name of the Series. +You can concatenate a mix of ``Series`` and ``DataFrame``s. The +``Series`` will be transformed to ``DataFrame`` with the column name as +the name of the ``Series``. .. ipython:: python @@ -325,7 +323,7 @@ the name of the Series. labels=['df1', 's1'], vertical=False); plt.close('all'); -If unnamed Series are passed they will be numbered consecutively. +If unnamed ``Series`` are passed they will be numbered consecutively. .. ipython:: python @@ -357,8 +355,10 @@ Passing ``ignore_index=True`` will drop all name references. More concatenating with group keys ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A fairly common use of the ``keys`` argument is to override the column names when creating a new DataFrame based on existing Series. -Notice how the default behaviour consists on letting the resulting DataFrame inherits the parent Series' name, when these existed. +A fairly common use of the ``keys`` argument is to override the column names +when creating a new ``DataFrame`` based on existing ``Series``. +Notice how the default behaviour consists on letting the resulting ``DataFrame`` +inherit the parent ``Series``' name, when these existed. .. ipython:: python @@ -374,7 +374,7 @@ Through the ``keys`` argument we can override the existing column names. pd.concat([s3, s4, s5], axis=1, keys=['red','blue','yellow']) -Let's consider now a variation on the very first example presented: +Let's consider a variation of the very first example presented: .. ipython:: python @@ -417,7 +417,7 @@ for the ``keys`` argument (unless other keys are specified): plt.close('all'); The MultiIndex created has levels that are constructed from the passed keys and -the index of the DataFrame pieces: +the index of the ``DataFrame`` pieces: .. ipython:: python @@ -444,7 +444,7 @@ do so using the ``levels`` argument: result.index.levels -Yes, this is fairly esoteric, but is actually necessary for implementing things +This is fairly esoteric, but it is actually necessary for implementing things like GroupBy where the order of a categorical variable is meaningful. .. _merging.append.row: @@ -453,8 +453,8 @@ Appending rows to a DataFrame ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While not especially efficient (since a new object must be created), you can -append a single row to a DataFrame by passing a Series or dict to ``append``, -which returns a new DataFrame as above. +append a single row to a ``DataFrame`` by passing a ``Series`` or dict to +``append``, which returns a new ``DataFrame`` as above. .. ipython:: python @@ -498,16 +498,16 @@ pandas has full-featured, **high performance** in-memory join operations idiomatically very similar to relational databases like SQL. These methods perform significantly better (in some cases well over an order of magnitude better) than other open source implementations (like ``base::merge.data.frame`` -in R). The reason for this is careful algorithmic design and internal layout of -the data in DataFrame. +in R). The reason for this is careful algorithmic design and the internal layout +of the data in ``DataFrame``. See the :ref:`cookbook` for some advanced strategies. Users who are familiar with SQL but new to pandas might be interested in a :ref:`comparison with SQL`. -pandas provides a single function, ``merge``, as the entry point for all -standard database join operations between DataFrame objects: +pandas provides a single function, :func:`~pandas.merge`, as the entry point for +all standard database join operations between ``DataFrame`` objects: :: @@ -516,28 +516,28 @@ standard database join operations between DataFrame objects: suffixes=('_x', '_y'), copy=True, indicator=False, validate=None) -- ``left``: A DataFrame object -- ``right``: Another DataFrame object +- ``left``: A DataFrame object. +- ``right``: Another DataFrame object. - ``on``: Column or index level names to join on. Must be found in both the left and right DataFrame objects. If not passed and ``left_index`` and ``right_index`` are ``False``, the intersection of the columns in the - DataFrames will be inferred to be the join keys + DataFrames will be inferred to be the join keys. - ``left_on``: Columns or index levels from the left DataFrame to use as keys. Can either be column names, index level names, or arrays with length - equal to the length of the DataFrame + equal to the length of the DataFrame. - ``right_on``: Columns or index levels from the right DataFrame to use as keys. Can either be column names, index level names, or arrays with length - equal to the length of the DataFrame + equal to the length of the DataFrame. - ``left_index``: If ``True``, use the index (row labels) from the left DataFrame as its join key(s). In the case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys - from the right DataFrame + from the right DataFrame. - ``right_index``: Same usage as ``left_index`` for the right DataFrame - ``how``: One of ``'left'``, ``'right'``, ``'outer'``, ``'inner'``. Defaults - to ``inner``. See below for more detailed description of each method + to ``inner``. See below for more detailed description of each method. - ``sort``: Sort the result DataFrame by the join keys in lexicographical order. Defaults to ``True``, setting to ``False`` will improve performance - substantially in many cases + substantially in many cases. - ``suffixes``: A tuple of string suffixes to apply to overlapping columns. Defaults to ``('_x', '_y')``. - ``copy``: Always copy data (default ``True``) from the passed DataFrame @@ -575,10 +575,10 @@ and ``right`` is a subclass of DataFrame, the return type will still be ``DataFrame``. ``merge`` is a function in the pandas namespace, and it is also available as a -DataFrame instance method, with the calling DataFrame being implicitly -considered the left object in the join. +``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling +``DataFrame `` being implicitly considered the left object in the join. -The related ``DataFrame.join`` method, uses ``merge`` internally for the +The related :meth:`~DataFrame.join` method, uses ``merge`` internally for the index-on-index (by default) and column(s)-on-index join. If you are joining on index only, you may wish to use ``DataFrame.join`` to save yourself some typing. @@ -587,19 +587,19 @@ Brief primer on merge methods (relational algebra) Experienced users of relational databases like SQL will be familiar with the terminology used to describe join operations between two SQL-table like -structures (DataFrame objects). There are several cases to consider which are -very important to understand: +structures (``DataFrame`` objects). There are several cases to consider which +are very important to understand: -- **one-to-one** joins: for example when joining two DataFrame objects on - their indexes (which must contain unique values) +- **one-to-one** joins: for example when joining two ``DataFrame`` objects on + their indexes (which must contain unique values). - **many-to-one** joins: for example when joining an index (unique) to one or - more columns in a DataFrame + more columns in a different ``DataFrame``. - **many-to-many** joins: joining columns on columns. .. note:: When joining columns on columns (potentially a many-to-many join), any - indexes on the passed DataFrame objects **will be discarded**. + indexes on the passed ``DataFrame`` objects **will be discarded**. It is worth spending some time understanding the result of the **many-to-many** @@ -627,7 +627,9 @@ key combination: labels=['left', 'right'], vertical=False); plt.close('all'); -Here is a more complicated example with multiple join keys: +Here is a more complicated example with multiple join keys. Only the keys +appearing in ``left`` and ``right`` are present (the intersection), since +``how='inner'```by default. .. ipython:: python @@ -712,7 +714,7 @@ either the left or right tables, the values in the joined table will be labels=['left', 'right'], vertical=False); plt.close('all'); -Here is another example with duplicate join keys in DataFrames: +Here is another example with duplicate join keys in ``DataFrame``s: .. ipython:: python @@ -742,9 +744,14 @@ Checking for duplicate keys .. versionadded:: 0.21.0 -Users can use the ``validate`` argument to automatically check whether there are unexpected duplicates in their merge keys. Key uniqueness is checked before merge operations and so should protect against memory overflows. Checking key uniqueness is also a good way to ensure user data structures are as expected. +Users can use the ``validate`` argument to automatically check whether there +are unexpected duplicates in their merge keys. Key uniqueness is checked before +merge operations and so should protect against memory overflows. Checking key +uniqueness is also a good way to ensure user data structures are as expected. -In the following example, there are duplicate values of ``B`` in the right DataFrame. As this is not a one-to-one merge -- as specified in the ``validate`` argument -- an exception will be raised. +In the following example, there are duplicate values of ``B`` in the right +``DataFrame``. As this is not a one-to-one merge -- as specified in the +``validate`` argument -- an exception will be raised. .. ipython:: python @@ -758,7 +765,9 @@ In the following example, there are duplicate values of ``B`` in the right DataF ... MergeError: Merge keys are not unique in right dataset; not a one-to-one merge -If the user is aware of the duplicates in the right `DataFrame` but wants to ensure there are no duplicates in the left DataFrame, one can use the `validate='one_to_many'` argument instead, which will not raise an exception. +If the user is aware of the duplicates in the right ``DataFrame`` but wants to +ensure there are no duplicates in the left DataFrame, one can use the +``validate='one_to_many'`` argument instead, which will not raise an exception. .. ipython:: python @@ -770,7 +779,9 @@ If the user is aware of the duplicates in the right `DataFrame` but wants to ens The merge indicator ~~~~~~~~~~~~~~~~~~~ -``merge`` accepts the argument ``indicator``. If ``True``, a Categorical-type column called ``_merge`` will be added to the output object that takes on values: +:func:`~pandas.merge` accepts the argument ``indicator``. If ``True``, a +Categorical-type column called ``_merge`` will be added to the output object +that takes on values: =================================== ================ Observation Origin ``_merge`` value @@ -809,7 +820,7 @@ Merging will preserve the dtype of the join keys. right = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]}) right -We are able to preserve the join keys +We are able to preserve the join keys: .. ipython:: python @@ -826,7 +837,7 @@ resulting dtype will be upcast. .. versionadded:: 0.20.0 -Merging will preserve ``category`` dtypes of the mergands. See also the section on :ref:`categoricals ` +Merging will preserve ``category`` dtypes of the mergands. See also the section on :ref:`categoricals `. The left frame. @@ -854,7 +865,7 @@ The right frame. right right.dtypes -The merged result +The merged result: .. ipython:: python @@ -876,9 +887,9 @@ The merged result Joining on index ~~~~~~~~~~~~~~~~ -``DataFrame.join`` is a convenient method for combining the columns of two -potentially differently-indexed DataFrames into a single result DataFrame. Here -is a very basic example: +:meth:`DataFrame.join` is a convenient method for combining the columns of two +potentially differently-indexed ``DataFrames`` into a single result +``DataFrame``. Here is a very basic example: .. ipython:: python @@ -912,6 +923,8 @@ is a very basic example: labels=['left', 'right'], vertical=False); plt.close('all'); +The same as above, but with ``how='inner'``. + .. ipython:: python result = left.join(right, how='inner') @@ -955,10 +968,10 @@ indexes: Joining key columns on an index ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -``join`` takes an optional ``on`` argument which may be a column or multiple -column names, which specifies that the passed DataFrame is to be aligned on -that column in the DataFrame. These two function calls are completely -equivalent: +:meth:`~DataFrame.join` takes an optional ``on`` argument which may be a column +or multiple column names, which specifies that the passed ``DataFrame`` is to be +aligned on that column in the ``DataFrame``. These two function calls are +completely equivalent: :: @@ -967,8 +980,8 @@ equivalent: how='left', sort=False) Obviously you can choose whichever form you find more convenient. For -many-to-one joins (where one of the DataFrame's is already indexed by the join -key), using ``join`` may be more convenient. Here is a simple example: +many-to-one joins (where one of the ``DataFrame``'s is already indexed by the +join key), using ``join`` may be more convenient. Here is a simple example: .. ipython:: python @@ -1105,7 +1118,8 @@ This is equivalent but less verbose and more memory efficient / faster than this Joining with two multi-indexes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -This is not Implemented via ``join`` at-the-moment, however it can be done using the following. +This is not implemented via ``join`` at-the-moment, however it can be done using +the following code. .. ipython:: python @@ -1181,7 +1195,7 @@ Overlapping value columns ~~~~~~~~~~~~~~~~~~~~~~~~~ The merge ``suffixes`` argument takes a tuple of list of strings to append to -overlapping column names in the input DataFrames to disambiguate the result +overlapping column names in the input ``DataFrame``s to disambiguate the result columns: .. ipython:: python @@ -1211,7 +1225,7 @@ columns: labels=['left', 'right'], vertical=False); plt.close('all'); -``DataFrame.join`` has ``lsuffix`` and ``rsuffix`` arguments which behave +:meth:`DataFrame.join` has ``lsuffix`` and ``rsuffix`` arguments which behave similarly. .. ipython:: python @@ -1233,8 +1247,8 @@ similarly. Joining multiple DataFrame or Panel objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A list or tuple of DataFrames can also be passed to ``DataFrame.join`` to join -them together on their indexes. The same is true for ``Panel.join``. +A list or tuple of ``DataFrames`` can also be passed to :meth:`~DataFrame.join` +to join them together on their indexes. .. ipython:: python @@ -1255,8 +1269,8 @@ Merging together values within Series or DataFrame columns ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Another fairly common situation is to have two like-indexed (or similarly -indexed) Series or DataFrame objects and wanting to "patch" values in one -object from values for matching indices in the other. Here is an example: +indexed) ``Series`` or ``DataFrame`` objects and wanting to "patch" values in +one object from values for matching indices in the other. Here is an example: .. ipython:: python @@ -1265,7 +1279,7 @@ object from values for matching indices in the other. Here is an example: df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]], index=[1, 2]) -For this, use the ``combine_first`` method: +For this, use the :meth:`~DataFrame.combine_first` method: .. ipython:: python @@ -1279,9 +1293,9 @@ For this, use the ``combine_first`` method: labels=['df1', 'df2'], vertical=False); plt.close('all'); -Note that this method only takes values from the right DataFrame if they are -missing in the left DataFrame. A related method, ``update``, alters non-NA -values inplace: +Note that this method only takes values from the right ``DataFrame`` if they are +missing in the left ``DataFrame``. A related method, :meth:`~DataFrame.update`, +alters non-NA values inplace: .. ipython:: python :suppress: @@ -1332,12 +1346,16 @@ Merging AsOf .. versionadded:: 0.19.0 -A :func:`merge_asof` is similar to an ordered left-join except that we match on nearest key rather than equal keys. For each row in the ``left`` DataFrame, we select the last row in the ``right`` DataFrame whose ``on`` key is less than the left's key. Both DataFrames must be sorted by the key. +A :func:`merge_asof` is similar to an ordered left-join except that we match on +nearest key rather than equal keys. For each row in the ``left`` ``DataFrame``, +we select the last row in the ``right`` ``DataFrame`` whose ``on`` key is less +than the left's key. Both DataFrames must be sorted by the key. -Optionally an asof merge can perform a group-wise merge. This matches the ``by`` key equally, -in addition to the nearest match on the ``on`` key. +Optionally an asof merge can perform a group-wise merge. This matches the +``by`` key equally, in addition to the nearest match on the ``on`` key. -For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` merge them. +For example; we might have ``trades`` and ``quotes`` and we want to ``asof`` +merge them. .. ipython:: python @@ -1395,9 +1413,9 @@ We only asof within ``2ms`` between the quote time and the trade time. by='ticker', tolerance=pd.Timedelta('2ms')) -We only asof within ``10ms`` between the quote time and the trade time and we exclude exact matches on time. -Note that though we exclude the exact matches (of the quotes), prior quotes DO propagate to that point -in time. +We only asof within ``10ms`` between the quote time and the trade time and we +exclude exact matches on time. Note that though we exclude the exact matches +(of the quotes), prior quotes **do** propagate to that point in time. .. ipython:: python diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index e2b7b0e586d70..096a36356b573 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -41,7 +41,7 @@ Data is often stored in CSV files or databases in so-called "stacked" or df -For the curious here is how the above DataFrame was created: +For the curious here is how the above ``DataFrame`` was created: .. code-block:: python @@ -63,15 +63,16 @@ To select out everything for variable ``A`` we could do: But suppose we wish to do time series operations with the variables. A better representation would be where the ``columns`` are the unique variables and an ``index`` of dates identifies individual observations. To reshape the data into -this form, use the ``pivot`` function: +this form, we use the :meth:`DataFrame.pivot` method (also implemented as a +top level function :func:`pandas.pivot`): .. ipython:: python df.pivot(index='date', columns='variable', values='value') -If the ``values`` argument is omitted, and the input DataFrame has more than +If the ``values`` argument is omitted, and the input ``DataFrame`` has more than one column of values which are not used as column or index inputs to ``pivot``, -then the resulting "pivoted" DataFrame will have :ref:`hierarchical columns +then the resulting "pivoted" ``DataFrame`` will have :ref:`hierarchical columns ` whose topmost level indicates the respective value column: @@ -81,7 +82,7 @@ column: pivoted = df.pivot('date', 'variable') pivoted -You of course can then select subsets from the pivoted DataFrame: +You can then select subsets from the pivoted ``DataFrame``: .. ipython:: python @@ -95,18 +96,18 @@ are homogeneously-typed. Reshaping by stacking and unstacking ------------------------------------ -Closely related to the ``pivot`` function are the related ``stack`` and -``unstack`` functions currently available on Series and DataFrame. These -functions are designed to work together with ``MultiIndex`` objects (see the -section on :ref:`hierarchical indexing `). Here are -essentially what these functions do: +Closely related to the :meth:`~DataFrame.pivot` method are the related +:meth:`~DataFrame.stack` and :meth:`~DataFrame.unstack` methods available on +``Series`` and ``DataFrame``. These methods are designed to work together with +``MultiIndex`` objects (see the section on :ref:`hierarchical indexing +`). Here are essentially what these methods do: - ``stack``: "pivot" a level of the (possibly hierarchical) column labels, - returning a DataFrame with an index with a new inner-most level of row + returning a ``DataFrame`` with an index with a new inner-most level of row labels. - - ``unstack``: inverse operation from ``stack``: "pivot" a level of the + - ``unstack``: (inverse operation of ``stack``) "pivot" a level of the (possibly hierarchical) row index to the column axis, producing a reshaped - DataFrame with a new inner-most level of column labels. + ``DataFrame`` with a new inner-most level of column labels. The clearest way to explain is by example. Let's take a prior example data set from the hierarchical indexing section: @@ -122,11 +123,11 @@ from the hierarchical indexing section: df2 = df[:4] df2 -The ``stack`` function "compresses" a level in the DataFrame's columns to +The ``stack`` function "compresses" a level in the ``DataFrame``'s columns to produce either: - - A Series, in the case of a simple column Index - - A DataFrame, in the case of a ``MultiIndex`` in the columns + - A ``Series``, in the case of a simple column Index. + - A ``DataFrame``, in the case of a ``MultiIndex`` in the columns. If the columns have a ``MultiIndex``, you can choose which level to stack. The stacked level becomes the new lowest level in a ``MultiIndex`` on the columns: @@ -136,7 +137,7 @@ stacked level becomes the new lowest level in a ``MultiIndex`` on the columns: stacked = df2.stack() stacked -With a "stacked" DataFrame or Series (having a ``MultiIndex`` as the +With a "stacked" ``DataFrame`` or ``Series`` (having a ``MultiIndex`` as the ``index``), the inverse operation of ``stack`` is ``unstack``, which by default unstacks the **last level**: @@ -157,7 +158,7 @@ the level numbers: Notice that the ``stack`` and ``unstack`` methods implicitly sort the index levels involved. Hence a call to ``stack`` and then ``unstack``, or vice versa, -will result in a **sorted** copy of the original DataFrame or Series: +will result in a **sorted** copy of the original ``DataFrame`` or ``Series``: .. ipython:: python @@ -166,7 +167,7 @@ will result in a **sorted** copy of the original DataFrame or Series: df all(df.unstack().stack() == df.sort_index()) -while the above code will raise a ``TypeError`` if the call to ``sort_index`` is +The above code will raise a ``TypeError`` if the call to ``sort_index`` is removed. .. _reshaping.stack_multiple: @@ -265,12 +266,12 @@ the right thing: Reshaping by Melt ----------------- -The top-level :func:`melt` and :func:`~DataFrame.melt` functions are useful to -massage a DataFrame into a format where one or more columns are identifier variables, -while all other columns, considered measured variables, are "unpivoted" to the -row axis, leaving just two non-identifier columns, "variable" and "value". The -names of those columns can be customized by supplying the ``var_name`` and -``value_name`` parameters. +The top-level :func:`~pandas.melt` function and the corresponding :meth:`DataFrame.melt` +are useful to massage a ``DataFrame`` into a format where one or more columns +are *identifier variables*, while all other columns, considered *measured +variables*, are "unpivoted" to the row axis, leaving just two non-identifier +columns, "variable" and "value". The names of those columns can be customized +by supplying the ``var_name`` and ``value_name`` parameters. For instance, @@ -284,8 +285,9 @@ For instance, cheese.melt(id_vars=['first', 'last']) cheese.melt(id_vars=['first', 'last'], var_name='quantity') -Another way to transform is to use the ``wide_to_long`` panel data convenience -function. +Another way to transform is to use the :func:`~pandas.wide_to_long` panel data +convenience function. It is less flexible than :func:`~pandas.melt`, but more +user-friendly. .. ipython:: python @@ -324,22 +326,25 @@ Pivot tables .. _reshaping.pivot: -While ``pivot`` provides general purpose pivoting of DataFrames with various -data types (strings, numerics, etc.), Pandas also provides the ``pivot_table`` -function for pivoting with aggregation of numeric data. -The function ``pandas.pivot_table`` can be used to create spreadsheet-style pivot -tables. See the :ref:`cookbook` for some advanced strategies -It takes a number of arguments +While :meth:`~DataFrame.pivot` provides general purpose pivoting with various +data types (strings, numerics, etc.), pandas also provides :func:`~pandas.pivot_table` +for pivoting with aggregation of numeric data. + +The function :func:`~pandas.pivot_table` can be used to create spreadsheet-style +pivot tables. See the :ref:`cookbook` for some advanced +strategies. + +It takes a number of arguments: -- ``data``: A DataFrame object -- ``values``: a column or a list of columns to aggregate +- ``data``: a DataFrame object. +- ``values``: a column or a list of columns to aggregate. - ``index``: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values. - ``columns``: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values. -- ``aggfunc``: function to use for aggregation, defaulting to ``numpy.mean`` +- ``aggfunc``: function to use for aggregation, defaulting to ``numpy.mean``. Consider a data set like this: @@ -363,7 +368,7 @@ We can produce pivot tables from this data very easily: pd.pivot_table(df, values='D', index=['B'], columns=['A', 'C'], aggfunc=np.sum) pd.pivot_table(df, values=['D','E'], index=['B'], columns=['A', 'C'], aggfunc=np.sum) -The result object is a DataFrame having potentially hierarchical indexes on the +The result object is a ``DataFrame`` having potentially hierarchical indexes on the rows and columns. If the ``values`` column name is not given, the pivot table will include all of the data that can be aggregated in an additional level of hierarchy in the columns: @@ -386,7 +391,8 @@ calling ``to_string`` if you wish: table = pd.pivot_table(df, index=['A', 'B'], columns=['C']) print(table.to_string(na_rep='')) -Note that ``pivot_table`` is also available as an instance method on DataFrame. +Note that ``pivot_table`` is also available as an instance method on DataFrame, + i.e. :meth:`DataFrame.pivot_table`. .. _reshaping.pivot.margins: @@ -406,27 +412,27 @@ rows and columns: Cross tabulations ----------------- -Use the ``crosstab`` function to compute a cross-tabulation of two (or more) +Use :func:`~pandas.crosstab` to compute a cross-tabulation of two (or more) factors. By default ``crosstab`` computes a frequency table of the factors unless an array of values and an aggregation function are passed. It takes a number of arguments -- ``index``: array-like, values to group by in the rows -- ``columns``: array-like, values to group by in the columns +- ``index``: array-like, values to group by in the rows. +- ``columns``: array-like, values to group by in the columns. - ``values``: array-like, optional, array of values to aggregate according to - the factors + the factors. - ``aggfunc``: function, optional, If no values array is passed, computes a - frequency table -- ``rownames``: sequence, default ``None``, must match number of row arrays passed + frequency table. +- ``rownames``: sequence, default ``None``, must match number of row arrays passed. - ``colnames``: sequence, default ``None``, if passed, must match number of column - arrays passed + arrays passed. - ``margins``: boolean, default ``False``, Add row/column margins (subtotals) - ``normalize``: boolean, {'all', 'index', 'columns'}, or {0,1}, default ``False``. Normalize by dividing all values by the sum of values. -Any Series passed will have their name attributes used unless row or column +Any ``Series`` passed will have their name attributes used unless row or column names for the cross-tabulation are specified For example: @@ -478,9 +484,9 @@ using the ``normalize`` argument: pd.crosstab(df.A, df.B, normalize='columns') -``crosstab`` can also be passed a third Series and an aggregation function -(``aggfunc``) that will be applied to the values of the third Series within each -group defined by the first two Series: +``crosstab`` can also be passed a third ``Series`` and an aggregation function +(``aggfunc``) that will be applied to the values of the third ``Series`` within +each group defined by the first two ``Series``: .. ipython:: python @@ -502,7 +508,7 @@ Finally, one can also add margins or normalize this output. Tiling ------ -The ``cut`` function computes groupings for the values of the input array and +The :func:`~pandas.cut` function computes groupings for the values of the input array and is often used to transform continuous variables to discrete or categorical variables: From 5859a4fcdcf20b3147f69dbbbe18be94048d209b Mon Sep 17 00:00:00 2001 From: tommyod Date: Tue, 2 Jan 2018 18:07:04 +0100 Subject: [PATCH 2/5] Finished spellchecking reshaping.rst --- doc/source/reshaping.rst | 43 ++++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index 096a36356b573..98010cf8decd2 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -508,9 +508,9 @@ Finally, one can also add margins or normalize this output. Tiling ------ -The :func:`~pandas.cut` function computes groupings for the values of the input array and -is often used to transform continuous variables to discrete or categorical -variables: +The :func:`~pandas.cut` function computes groupings for the values of the input +array and is often used to transform continuous variables to discrete or +categorical variables: .. ipython:: python @@ -529,7 +529,7 @@ Alternatively we can specify custom bin-edges: .. versionadded:: 0.20.0 If the ``bins`` keyword is an ``IntervalIndex``, then these will be -used to bin the passed data. +used to bin the passed data.:: pd.cut([25, 20, 50], bins=c.categories) @@ -539,9 +539,10 @@ used to bin the passed data. Computing indicator / dummy variables ------------------------------------- -To convert a categorical variable into a "dummy" or "indicator" DataFrame, for example -a column in a DataFrame (a Series) which has ``k`` distinct values, can derive a DataFrame -containing ``k`` columns of 1s and 0s: +To convert a categorical variable into a "dummy" or "indicator" ``DataFrame``, +for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct +values, can derive a ``DataFrame`` containing ``k`` columns of 1s and 0s using +:func:`~pandas.get_dummies`: .. ipython:: python @@ -550,7 +551,7 @@ containing ``k`` columns of 1s and 0s: pd.get_dummies(df['key']) Sometimes it's useful to prefix the column names, for example when merging the result -with the original DataFrame: +with the original ``DataFrame``: .. ipython:: python @@ -575,9 +576,9 @@ This function is often used along with discretization functions like ``cut``: See also :func:`Series.str.get_dummies `. -:func:`get_dummies` also accepts a DataFrame. By default all categorical -variables (categorical in the statistical sense, -those with `object` or `categorical` dtype) are encoded as dummy variables. +:func:`get_dummies` also accepts a ``DataFrame``. By default all categorical +variables (categorical in the statistical sense, those with `object` or +`categorical` dtype) are encoded as dummy variables. .. ipython:: python @@ -586,9 +587,8 @@ those with `object` or `categorical` dtype) are encoded as dummy variables. 'C': [1, 2, 3]}) pd.get_dummies(df) -All non-object columns are included untouched in the output. - -You can control the columns that are encoded with the ``columns`` keyword. +All non-object columns are included untouched in the output. You can control +the columns that are encoded with the ``columns`` keyword. .. ipython:: python @@ -598,14 +598,14 @@ Notice that the ``B`` column is still included in the output, it just hasn't been encoded. You can drop ``B`` before calling ``get_dummies`` if you don't want to include it in the output. -As with the Series version, you can pass values for the ``prefix`` and +As with the ``Series`` version, you can pass values for the ``prefix`` and ``prefix_sep``. By default the column name is used as the prefix, and '_' as -the prefix separator. You can specify ``prefix`` and ``prefix_sep`` in 3 ways +the prefix separator. You can specify ``prefix`` and ``prefix_sep`` in 3 ways: - string: Use the same value for ``prefix`` or ``prefix_sep`` for each column - to be encoded + to be encoded. - list: Must be the same length as the number of columns being encoded. -- dict: Mapping column name to prefix +- dict: Mapping column name to prefix. .. ipython:: python @@ -640,7 +640,8 @@ When a column contains only one level, it will be omitted in the result. pd.get_dummies(df, drop_first=True) -By default new columns will have ``np.uint8`` dtype. To choose another dtype use ``dtype`` argument: +By default new columns will have ``np.uint8`` dtype. +To choose another dtype, use the``dtype`` argument: .. ipython:: python @@ -656,7 +657,7 @@ By default new columns will have ``np.uint8`` dtype. To choose another dtype use Factorizing values ------------------ -To encode 1-d values as an enumerated type use ``factorize``: +To encode 1-d values as an enumerated type use :func:`~pandas.factorize`: .. ipython:: python @@ -672,7 +673,7 @@ handling of NaN: .. note:: The following ``numpy.unique`` will fail under Python 3 with a ``TypeError`` because of an ordering bug. See also - `Here `__ + `here `__. .. code-block:: ipython From 76d6f0b7d523a81bb2d145fb863fc53a4490797a Mon Sep 17 00:00:00 2001 From: tommyod Date: Wed, 3 Jan 2018 21:31:13 +0100 Subject: [PATCH 3/5] Read through first half of 'timeseries.rst', minor changes --- doc/source/timeseries.rst | 95 ++++++++++++++++++++++----------------- 1 file changed, 54 insertions(+), 41 deletions(-) diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index fa21cc997d4f4..beb0a83f475de 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -60,7 +60,7 @@ Change frequency and fill gaps: converted = ts.asfreq('45Min', method='pad') converted.head() -Resample: +Resample the series to a daily frequency: .. ipython:: python @@ -73,7 +73,7 @@ Resample: Overview -------- -Following table shows the type of time-related classes pandas can handle and +The ollowing table shows the type of time-related classes pandas can handle and how to create them. ================= =============================== =================================================================== @@ -112,9 +112,9 @@ For example: pd.Period('2012-05', freq='D') -``Timestamp`` and ``Period`` can be the index. Lists of ``Timestamp`` and -``Period`` are automatically coerced to ``DatetimeIndex`` and ``PeriodIndex`` -respectively. +:class:`Timestamp` and :class:`Period` can serve as an index. Lists of +``Timestamp`` and ``Period`` are automatically coerced to :class:`DatetimeIndex` +and :class:`PeriodIndex` respectively. .. ipython:: python @@ -149,7 +149,7 @@ future releases. Converting to Timestamps ------------------------ -To convert a ``Series`` or list-like object of date-like objects e.g. strings, +To convert a :class:`Series` or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the ``to_datetime`` function. When passed a ``Series``, this returns a ``Series`` (with the same index), while a list-like is converted to a ``DatetimeIndex``: @@ -197,7 +197,9 @@ This could also potentially speed up the conversion considerably. pd.to_datetime('12-11-2010 00:00', format='%d-%m-%Y %H:%M') -For more information on how to specify the ``format`` options, see https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior. +For more information on the choices available when specifying the ``format`` +option, see the Python `datetime documentation +` +Furthermore, if you have a ``Series`` with datetimelike values, then you can +access these properties via the ``.dt`` accessor, as detailed in the section +on :ref:`.dt accessors`. .. _timeseries.offsets: @@ -718,8 +724,8 @@ DateOffset Objects In the preceding examples, we created ``DatetimeIndex`` objects at various frequencies by passing in :ref:`frequency strings ` -like 'M', 'W', and 'BM to the ``freq`` keyword. Under the hood, these frequency -strings are being translated into an instance of pandas ``DateOffset``, +like 'M', 'W', and 'BM' to the ``freq`` keyword. Under the hood, these frequency +strings are being translated into an instance of :class:`DateOffset`, which represents a regular frequency increment. Specific offset logic like "month", "business day", or "one hour" is represented in its various subclasses. @@ -761,7 +767,7 @@ which represents a regular frequency increment. Specific offset logic like Nano, "one nanosecond" The basic ``DateOffset`` takes the same arguments as -``dateutil.relativedelta``, which works like: +``dateutil.relativedelta``, which works as follows: .. ipython:: python @@ -777,12 +783,13 @@ We could have done the same thing with ``DateOffset``: The key features of a ``DateOffset`` object are: -- it can be added / subtracted to/from a datetime object to obtain a - shifted date -- it can be multiplied by an integer (positive or negative) so that the - increment will be applied multiple times -- it has ``rollforward`` and ``rollback`` methods for moving a date forward - or backward to the next or previous "offset date" +- It can be added / subtracted to/from a datetime object to obtain a + shifted date. +- It can be multiplied by an integer (positive or negative) so that the + increment will be applied multiple times. +- It has :meth:`~pandas.DateOffset.rollforward` and + :meth:`~pandas.DateOffset.rollback` methods for moving a date forward or + backward to the next or previous "offset date". Subclasses of ``DateOffset`` define the ``apply`` function which dictates custom date increment logic, such as adding business days: @@ -811,7 +818,10 @@ The ``rollforward`` and ``rollback`` methods do exactly what you would expect: It's definitely worth exploring the ``pandas.tseries.offsets`` module and the various docstrings for the classes. -These operations (``apply``, ``rollforward`` and ``rollback``) preserves time (hour, minute, etc) information by default. To reset time, use ``normalize=True`` keyword when creating the offset instance. If ``normalize=True``, result is normalized after the function is applied. +These operations (``apply``, ``rollforward`` and ``rollback``) preserve time +(hour, minute, etc) information by default. To reset time, use ``normalize=True`` +when creating the offset instance. If ``normalize=True``, the result is +normalized after the function is applied. .. ipython:: python @@ -847,7 +857,7 @@ particular day of the week: d - Week() -``normalize`` option will be effective for addition and subtraction. +The ``normalize`` option will be effective for addition and subtraction. .. ipython:: python @@ -926,7 +936,7 @@ As an interesting example, let's look at Egypt where a Friday-Saturday weekend i dt = datetime(2013, 4, 30) dt + 2 * bday_egypt -Let's map to the weekday names +Let's map to the weekday names: .. ipython:: python @@ -982,9 +992,10 @@ The ``BusinessHour`` class provides a business hour representation on ``Business allowing to use specific start and end times. By default, ``BusinessHour`` uses 9:00 - 17:00 as business hours. -Adding ``BusinessHour`` will increment ``Timestamp`` by hourly. -If target ``Timestamp`` is out of business hours, move to the next business hour then increment it. -If the result exceeds the business hours end, remaining is added to the next business day. +Adding ``BusinessHour`` will increment ``Timestamp`` by hourly frequency. +If target ``Timestamp`` is out of business hours, move to the next business hour +then increment it. If the result exceeds the business hours end, the remaining +hours are added to the next business day. .. ipython:: python @@ -1010,9 +1021,10 @@ If the result exceeds the business hours end, remaining is added to the next bus # Subtracting 3 business hours pd.Timestamp('2014-08-01 10:00') + BusinessHour(-3) -Also, you can specify ``start`` and ``end`` time by keywords. -Argument must be ``str`` which has ``hour:minute`` representation or ``datetime.time`` instance. -Specifying seconds, microseconds and nanoseconds as business hour results in ``ValueError``. +You can also specify ``start`` and ``end`` time by keywords. The argument must +be a ``str`` with an ``hour:minute`` representation or a ``datetime.time`` +instance. Specifying seconds, microseconds and nanoseconds as business hour +results in ``ValueError``. .. ipython:: python @@ -1068,8 +1080,9 @@ under the default business hours (9:00 - 17:00), there is no gap (0 minutes) bet # The result is the same as rollworward because BusinessDay never overlap. BusinessHour().apply(pd.Timestamp('2014-08-02')) -``BusinessHour`` regards Saturday and Sunday as holidays. To use arbitrary holidays, -you can use ``CustomBusinessHour`` offset, see :ref:`Custom Business Hour `: +``BusinessHour`` regards Saturday and Sunday as holidays. To use arbitrary +holidays, you can use ``CustomBusinessHour`` offset, as explained in the +following subsection. .. _timeseries.custombusinesshour: From a7dc98e75ff528a27395400e95a4a5c5469da515 Mon Sep 17 00:00:00 2001 From: tommyod Date: Thu, 4 Jan 2018 18:45:00 +0100 Subject: [PATCH 4/5] Finished spellchecking timeseries.rst --- doc/source/timeseries.rst | 71 +++++++++++++++++++++------------------ 1 file changed, 38 insertions(+), 33 deletions(-) diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index beb0a83f475de..ffbcf4b4da4e6 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -1225,7 +1225,7 @@ Anchored Offset Semantics ~~~~~~~~~~~~~~~~~~~~~~~~~ For those offsets that are anchored to the start or end of specific -frequency (``MonthEnd``, ``MonthBegin``, ``WeekEnd``, etc) the following +frequency (``MonthEnd``, ``MonthBegin``, ``WeekEnd``, etc), the following rules apply to rolling forward and backwards. When ``n`` is not 0, if the given date is not on an anchor point, it snapped to the next(previous) @@ -1276,7 +1276,7 @@ Holidays and calendars provide a simple way to define holiday rules to be used with ``CustomBusinessDay`` or in other analysis that requires a predefined set of holidays. The ``AbstractHolidayCalendar`` class provides all the necessary methods to return a list of holidays and only ``rules`` need to be defined -in a specific holiday calendar class. Further, ``start_date`` and ``end_date`` +in a specific holiday calendar class. Furthermore, the ``start_date`` and ``end_date`` class attributes determine over what date range holidays are generated. These should be overwritten on the ``AbstractHolidayCalendar`` class to have the range apply to all calendar subclasses. ``USFederalHolidayCalendar`` is the @@ -1331,7 +1331,7 @@ or ``Timestamp`` objects. datetime(2012, 7, 6) + offset Ranges are defined by the ``start_date`` and ``end_date`` class attributes -of ``AbstractHolidayCalendar``. The defaults are below. +of ``AbstractHolidayCalendar``. The defaults are shown below. .. ipython:: python @@ -1371,16 +1371,17 @@ Shifting / Lagging ~~~~~~~~~~~~~~~~~~ One may want to *shift* or *lag* the values in a time series back and forward in -time. The method for this is ``shift``, which is available on all of the pandas -objects. +time. The method for this is :meth:`~Series.shift`, which is available on all of +the pandas objects. .. ipython:: python ts = ts[:5] ts.shift(1) -The shift method accepts an ``freq`` argument which can accept a -``DateOffset`` class or other ``timedelta``-like object or also a :ref:`offset alias `: +The ``shift`` method accepts an ``freq`` argument which can accept a +``DateOffset`` class or other ``timedelta``-like object or also an +:ref:`offset alias `: .. ipython:: python @@ -1388,8 +1389,8 @@ The shift method accepts an ``freq`` argument which can accept a ts.shift(5, freq='BM') Rather than changing the alignment of the data and the index, ``DataFrame`` and -``Series`` objects also have a ``tshift`` convenience method that changes -all the dates in the index by a specified number of offsets: +``Series`` objects also have a :meth:`~Series.tshift` convenience method that +changes all the dates in the index by a specified number of offsets: .. ipython:: python @@ -1401,9 +1402,10 @@ is not being realigned. Frequency Conversion ~~~~~~~~~~~~~~~~~~~~ -The primary function for changing frequencies is the ``asfreq`` function. -For a ``DatetimeIndex``, this is basically just a thin, but convenient wrapper -around ``reindex`` which generates a ``date_range`` and calls ``reindex``. +The primary function for changing frequencies is the :meth:`~Series.asfreq` +method. For a ``DatetimeIndex``, this is basically just a thin, but convenient +wrapper around :meth:`~Series.reindex` which generates a ``date_range`` and +calls ``reindex``. .. ipython:: python @@ -1413,7 +1415,7 @@ around ``reindex`` which generates a ``date_range`` and calls ``reindex``. ts.asfreq(BDay()) ``asfreq`` provides a further convenience so you can specify an interpolation -method for any gaps that may appear after the frequency conversion +method for any gaps that may appear after the frequency conversion. .. ipython:: python @@ -1422,14 +1424,14 @@ method for any gaps that may appear after the frequency conversion Filling Forward / Backward ~~~~~~~~~~~~~~~~~~~~~~~~~~ -Related to ``asfreq`` and ``reindex`` is the ``fillna`` function documented in -the :ref:`missing data section `. +Related to ``asfreq`` and ``reindex`` is :meth:`~Series.fillna`, which is +documented in the :ref:`missing data section `. Converting to Python Datetimes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -``DatetimeIndex`` can be converted to an array of Python native datetime.datetime objects using the -``to_pydatetime`` method. +``DatetimeIndex`` can be converted to an array of Python native +:py:class:`datetime.datetime` objects using the ``to_pydatetime`` method. .. _timeseries.resampling: @@ -1441,20 +1443,22 @@ Resampling The interface to ``.resample`` has changed in 0.18.0 to be more groupby-like and hence more flexible. See the :ref:`whatsnew docs ` for a comparison with prior versions. -Pandas has a simple, powerful, and efficient functionality for -performing resampling operations during frequency conversion (e.g., converting -secondly data into 5-minutely data). This is extremely common in, but not -limited to, financial applications. +Pandas has a simple, powerful, and efficient functionality for performing +resampling operations during frequency conversion (e.g., converting secondly +data into 5-minutely data). This is extremely common in, but not limited to, +financial applications. -``.resample()`` is a time-based groupby, followed by a reduction method on each of its groups. -See some :ref:`cookbook examples ` for some advanced strategies +:meth:`~Series.resample` is a time-based groupby, followed by a reduction method +on each of its groups. See some :ref:`cookbook examples ` for +some advanced strategies. Starting in version 0.18.1, the ``resample()`` function can be used directly from ``DataFrameGroupBy`` objects, see the :ref:`groupby docs `. .. note:: - ``.resample()`` is similar to using a ``.rolling()`` operation with a time-based offset, see a discussion :ref:`here ` + ``.resample()`` is similar to using a :meth:`~Series.rolling` operation with + a time-based offset, see a discussion :ref:`here `. Basics ~~~~~~ @@ -1555,20 +1559,21 @@ For upsampling, you can specify a way to upsample and the ``limit`` parameter to Sparse Resampling ~~~~~~~~~~~~~~~~~ -Sparse timeseries are ones where you have a lot fewer points relative -to the amount of time you are looking to resample. Naively upsampling a sparse series can potentially -generate lots of intermediate values. When you don't want to use a method to fill these values, e.g. ``fill_method`` is ``None``, -then intermediate values will be filled with ``NaN``. +Sparse timeseries are the ones where you have a lot fewer points relative +to the amount of time you are looking to resample. Naively upsampling a sparse +series can potentially generate lots of intermediate values. When you don't want +to use a method to fill these values, e.g. ``fill_method`` is ``None``, then +intermediate values will be filled with ``NaN``. Since ``resample`` is a time-based groupby, the following is a method to efficiently -resample only the groups that are not all ``NaN`` +resample only the groups that are not all ``NaN``. .. ipython:: python rng = pd.date_range('2014-1-1', periods=100, freq='D') + pd.Timedelta('1s') ts = pd.Series(range(100), index=rng) -If we want to resample to the full range of the series +If we want to resample to the full range of the series: .. ipython:: python @@ -1637,7 +1642,7 @@ columns of a ``DataFrame``: 'B' : lambda x: np.std(x, ddof=1)}) The function names can also be strings. In order for a string to be valid it -must be implemented on the Resampled object +must be implemented on the resampled object: .. ipython:: python @@ -2013,7 +2018,7 @@ To convert from an ``int64`` based YYYYMMDD representation. s.apply(conv) s.apply(conv)[2] -These can easily be converted to a ``PeriodIndex`` +These can easily be converted to a ``PeriodIndex``: .. ipython:: python @@ -2291,7 +2296,7 @@ a convert on an aware stamp. pd.Series(s_aware.values) - However, these can be easily converted + However, these can be easily converted: .. ipython:: python From 4f1a8670b4c439f13a98423518b52cc63c8adeb1 Mon Sep 17 00:00:00 2001 From: tommyod Date: Thu, 4 Jan 2018 19:21:37 +0100 Subject: [PATCH 5/5] Replaced 'numpy' with 'NumPy', 'python' with 'Python' --- doc/source/basics.rst | 2 +- doc/source/enhancingperf.rst | 10 +++++----- doc/source/indexing.rst | 4 ++-- doc/source/io.rst | 2 +- doc/source/missing_data.rst | 4 ++-- doc/source/release.rst | 32 ++++++++++++++++---------------- doc/source/reshaping.rst | 2 +- doc/source/timeseries.rst | 2 +- doc/source/tutorials.rst | 2 +- 9 files changed, 30 insertions(+), 30 deletions(-) diff --git a/doc/source/basics.rst b/doc/source/basics.rst index bd49b5b7c9b32..55c26e2186344 100644 --- a/doc/source/basics.rst +++ b/doc/source/basics.rst @@ -2220,7 +2220,7 @@ For example, to select ``bool`` columns: df.select_dtypes(include=[bool]) -You can also pass the name of a dtype in the `numpy dtype hierarchy +You can also pass the name of a dtype in the `NumPy dtype hierarchy `__: .. ipython:: python diff --git a/doc/source/enhancingperf.rst b/doc/source/enhancingperf.rst index 57f07a41afbc3..7afa852262a38 100644 --- a/doc/source/enhancingperf.rst +++ b/doc/source/enhancingperf.rst @@ -28,14 +28,14 @@ For many use cases writing pandas in pure Python and NumPy is sufficient. In som computationally heavy applications however, it can be possible to achieve sizeable speed-ups by offloading work to `cython `__. -This tutorial assumes you have refactored as much as possible in python, for example +This tutorial assumes you have refactored as much as possible in Python, for example trying to remove for loops and making use of NumPy vectorization, it's always worth optimising in Python first. This tutorial walks through a "typical" process of cythonizing a slow computation. We use an `example from the cython documentation `__ but in the context of pandas. Our final cythonized solution is around 100 times -faster than the pure python. +faster than the pure Python. .. _enhancingperf.pure: @@ -52,7 +52,7 @@ We have a DataFrame to which we want to apply a function row-wise. 'x': 'x'}) df -Here's the function in pure python: +Here's the function in pure Python: .. ipython:: python @@ -173,7 +173,7 @@ Using ndarray It's calling series... a lot! It's creating a Series from each row, and get-ting from both the index and the series (three times for each row). Function calls are expensive -in python, so maybe we could minimize these by cythonizing the apply part. +in Python, so maybe we could minimize these by cythonizing the apply part. .. note:: @@ -231,7 +231,7 @@ the rows, applying our ``integrate_f_typed``, and putting this in the zeros arra .. note:: - Loops like this would be *extremely* slow in python, but in Cython looping + Loops like this would be *extremely* slow in Python, but in Cython looping over NumPy arrays is *fast*. .. code-block:: ipython diff --git a/doc/source/indexing.rst b/doc/source/indexing.rst index 0467ac225585b..4ebc8b82aaa47 100644 --- a/doc/source/indexing.rst +++ b/doc/source/indexing.rst @@ -84,7 +84,7 @@ of multi-axis indexing. ``length-1`` of the axis), but may also be used with a boolean array. ``.iloc`` will raise ``IndexError`` if a requested indexer is out-of-bounds, except *slice* indexers which allow - out-of-bounds indexing. (this conforms with python/numpy *slice* + out-of-bounds indexing. (this conforms with Python/NumPy *slice* semantics). Allowed inputs are: - An integer e.g. ``5``. @@ -1517,7 +1517,7 @@ The :meth:`~pandas.DataFrame.lookup` Method Sometimes you want to extract a set of values given a sequence of row labels and column labels, and the ``lookup`` method allows for this and returns a -numpy array. For instance: +NumPy array. For instance: .. ipython:: python diff --git a/doc/source/io.rst b/doc/source/io.rst index 5878272a3da42..2ef7e6d3b64f4 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -775,7 +775,7 @@ The simplest case is to just pass in ``parse_dates=True``: df = pd.read_csv('foo.csv', index_col=0, parse_dates=True) df - # These are python datetime objects + # These are Python datetime objects df.index It is often the case that we may want to store date and time data separately, diff --git a/doc/source/missing_data.rst b/doc/source/missing_data.rst index d2250ae7b2116..f56378b533909 100644 --- a/doc/source/missing_data.rst +++ b/doc/source/missing_data.rst @@ -86,8 +86,8 @@ pandas provides the :func:`isna` and .. warning:: - One has to be mindful that in Python (and numpy), the ``nan's`` don't compare equal, but ``None's`` **do**. - Note that Pandas/numpy uses the fact that ``np.nan != np.nan``, and treats ``None`` like ``np.nan``. + One has to be mindful that in Python (and NumPy), the ``nan's`` don't compare equal, but ``None's`` **do**. + Note that pandas/NumPy uses the fact that ``np.nan != np.nan``, and treats ``None`` like ``np.nan``. .. ipython:: python diff --git a/doc/source/release.rst b/doc/source/release.rst index de045c426cf7b..cd763de42d162 100644 --- a/doc/source/release.rst +++ b/doc/source/release.rst @@ -1635,7 +1635,7 @@ performance improvements along with a large number of bug fixes. Highlights include: -- Drop support for numpy < 1.7.0 (:issue:`7711`) +- Drop support for NumPy < 1.7.0 (:issue:`7711`) - The ``Categorical`` type was integrated as a first-class pandas type, see :ref:`here ` - New scalar type ``Timedelta``, and a new index type ``TimedeltaIndex``, see :ref:`here ` - New DataFrame default display for ``df.info()`` to include memory usage, see :ref:`Memory Usage ` @@ -2032,7 +2032,7 @@ Bug Fixes - Bug in Series.xs with a multi-index (:issue:`6018`) - Bug in Series construction of mixed type with datelike and an integer (which should result in object type and not automatic conversion) (:issue:`6028`) -- Possible segfault when chained indexing with an object array under numpy 1.7.1 (:issue:`6026`, :issue:`6056`) +- Possible segfault when chained indexing with an object array under NumPy 1.7.1 (:issue:`6026`, :issue:`6056`) - Bug in setting using fancy indexing a single element with a non-scalar (e.g. a list), (:issue:`6043`) - ``to_sql`` did not respect ``if_exists`` (:issue:`4110` :issue:`4304`) @@ -2177,7 +2177,7 @@ Improvements to existing features - allow DataFrame constructor to accept more list-like objects, e.g. list of ``collections.Sequence`` and ``array.Array`` objects (:issue:`3783`, :issue:`4297`, :issue:`4851`), thanks @lgautier -- DataFrame constructor now accepts a numpy masked record array +- DataFrame constructor now accepts a NumPy masked record array (:issue:`3478`), thanks @jnothman - ``__getitem__`` with ``tuple`` key (e.g., ``[:, 2]``) on ``Series`` without ``MultiIndex`` raises ``ValueError`` (:issue:`4759`, :issue:`4837`) @@ -2397,8 +2397,8 @@ API Changes support ``pow`` or ``mod`` with non-scalars. (:issue:`3765`) - Arithmetic func factories are now passed real names (suitable for using with super) (:issue:`5240`) -- Provide numpy compatibility with 1.7 for a calling convention like - ``np.prod(pandas_object)`` as numpy call with additional keyword args +- Provide NumPy compatibility with 1.7 for a calling convention like + ``np.prod(pandas_object)`` as NumPy call with additional keyword args (:issue:`4435`) - Provide __dir__ method (and local context) for tab completion / remove ipython completers code (:issue:`4501`) @@ -2481,7 +2481,7 @@ See :ref:`Internal Refactoring` - Series now inherits from ``NDFrame`` rather than directly from ``ndarray``. There are several minor changes that affect the API. - - numpy functions that do not support the array interface will now return + - NumPy functions that do not support the array interface will now return ``ndarrays`` rather than series, e.g. ``np.diff``, ``np.ones_like``, ``np.where`` - ``Series(0.5)`` would previously return the scalar ``0.5``, this is no @@ -2650,7 +2650,7 @@ Bug Fixes - Fix bug in having a rhs of ``np.timedelta64`` or ``np.offsets.DateOffset`` when operating with datetimes (:issue:`4532`) - Fix arithmetic with series/datetimeindex and ``np.timedelta64`` not working - the same (:issue:`4134`) and buggy timedelta in numpy 1.6 (:issue:`4135`) + the same (:issue:`4134`) and buggy timedelta in NumPy 1.6 (:issue:`4135`) - Fix bug in ``pd.read_clipboard`` on windows with PY3 (:issue:`4561`); not decoding properly - ``tslib.get_period_field()`` and ``tslib.get_period_field_arr()`` now raise @@ -2691,7 +2691,7 @@ Bug Fixes - Bug with reindexing on the index with a non-unique index will now raise ``ValueError`` (:issue:`4746`) - Bug in setting with ``loc/ix`` a single indexer with a multi-index axis and - a numpy array, related to (:issue:`3777`) + a NumPy array, related to (:issue:`3777`) - Bug in concatenation with duplicate columns across dtypes not merging with axis=0 (:issue:`4771`, :issue:`4975`) - Bug in ``iloc`` with a slice index failing (:issue:`4771`) @@ -2958,7 +2958,7 @@ API Changes to enable alternate encodings (:issue:`3750`) - enable support for ``iterator/chunksize`` with ``read_hdf`` - The repr() for (Multi)Index now obeys display.max_seq_items rather - then numpy threshold print options. (:issue:`3426`, :issue:`3466`) + then NumPy threshold print options. (:issue:`3426`, :issue:`3466`) - Added mangle_dupe_cols option to read_table/csv, allowing users to control legacy behaviour re dupe cols (A, A.1, A.2 vs A, A ) (:issue:`3468`) Note: The default value will change in 0.12 to the "no mangle" behaviour, @@ -3025,8 +3025,8 @@ API Changes as ``Index``, ``Categorical``, ``GroupBy``, ``SparseList``, and ``SparseArray`` (+ their base classes). Currently, ``PandasObject`` provides string methods (from ``StringMixin``). (:issue:`4090`, :issue:`4092`) -- New ``StringMixin`` that, given a ``__unicode__`` method, gets python 2 and - python 3 compatible string methods (``__str__``, ``__bytes__``, and +- New ``StringMixin`` that, given a ``__unicode__`` method, gets Python 2 and + Python 3 compatible string methods (``__str__``, ``__bytes__``, and ``__repr__``). Plus string safety throughout. Now employed in many places throughout the pandas library. (:issue:`4090`, :issue:`4092`) @@ -3139,7 +3139,7 @@ Bug Fixes two integer arrays with at least 10000 cells total (:issue:`3764`) - Indexing with a string with seconds resolution not selecting from a time index (:issue:`3925`) - csv parsers would loop infinitely if ``iterator=True`` but no ``chunksize`` was - specified (:issue:`3967`), python parser failing with ``chunksize=1`` + specified (:issue:`3967`), Python parser failing with ``chunksize=1`` - Fix index name not propagating when using ``shift`` - Fixed dropna=False being ignored with multi-index stack (:issue:`3997`) - Fixed flattening of columns when renaming MultiIndex columns DataFrame (:issue:`4004`) @@ -3301,7 +3301,7 @@ API Changes - all timedelta like objects will be correctly assigned to ``timedelta64`` with mixed ``NaN`` and/or ``NaT`` allowed -- arguments to DataFrame.clip were inconsistent to numpy and Series clipping +- arguments to DataFrame.clip were inconsistent to NumPy and Series clipping (:issue:`2747`) - util.testing.assert_frame_equal now checks the column and index names (:issue:`2964`) - Constructors will now return a more informative ValueError on failures @@ -3360,7 +3360,7 @@ Bug Fixes - Series ops with a Timestamp on the rhs was throwing an exception (:issue:`2898`) added tests for Series ops with datetimes,timedeltas,Timestamps, and datelike Series on both lhs and rhs - - Fixed subtle timedelta64 inference issue on py3 & numpy 1.7.0 (:issue:`3094`) + - Fixed subtle timedelta64 inference issue on py3 & NumPy 1.7.0 (:issue:`3094`) - Fixed some formatting issues on timedelta when negative - Support null checking on timedelta64, representing (and formatting) with NaT - Support setitem with np.nan value, converts to NaT @@ -4574,7 +4574,7 @@ Bug Fixes - Add clearer error message in csv parser (:issue:`835`) - Fix loss of fractional seconds in HDFStore (:issue:`513`) - Fix DataFrame join where columns have datetimes (:issue:`787`) -- Work around numpy performance issue in take (:issue:`817`) +- Work around NumPy performance issue in take (:issue:`817`) - Improve comparison operations for NA-friendliness (:issue:`801`) - Fix indexing operation for floating point values (:issue:`780`, :issue:`798`) - Fix groupby case resulting in malformed dataframe (:issue:`814`) @@ -5822,7 +5822,7 @@ API Changes `offset` argument for everything. So you can still pass a time rule string to `offset` - Added optional `encoding` argument to `read_csv`, `read_table`, `to_csv`, - `from_csv` to handle unicode in python 2.x + `from_csv` to handle unicode in Python 2.x Bug Fixes ~~~~~~~~~ diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst index 98010cf8decd2..71ddaa13fdd8a 100644 --- a/doc/source/reshaping.rst +++ b/doc/source/reshaping.rst @@ -64,7 +64,7 @@ But suppose we wish to do time series operations with the variables. A better representation would be where the ``columns`` are the unique variables and an ``index`` of dates identifies individual observations. To reshape the data into this form, we use the :meth:`DataFrame.pivot` method (also implemented as a -top level function :func:`pandas.pivot`): +top level function :func:`~pandas.pivot`): .. ipython:: python diff --git a/doc/source/timeseries.rst b/doc/source/timeseries.rst index ffbcf4b4da4e6..466c48b780861 100644 --- a/doc/source/timeseries.rst +++ b/doc/source/timeseries.rst @@ -73,7 +73,7 @@ Resample the series to a daily frequency: Overview -------- -The ollowing table shows the type of time-related classes pandas can handle and +The following table shows the type of time-related classes pandas can handle and how to create them. ================= =============================== =================================================================== diff --git a/doc/source/tutorials.rst b/doc/source/tutorials.rst index 0b8a2cb89b45e..43ccd372d9d5b 100644 --- a/doc/source/tutorials.rst +++ b/doc/source/tutorials.rst @@ -174,7 +174,7 @@ Various Tutorials - `Wes McKinney's (pandas BDFL) blog `_ - `Statistical analysis made easy in Python with SciPy and pandas DataFrames, by Randal Olson `_ - `Statistical Data Analysis in Python, tutorial videos, by Christopher Fonnesbeck from SciPy 2013 `_ -- `Financial analysis in python, by Thomas Wiecki `_ +- `Financial analysis in Python, by Thomas Wiecki `_ - `Intro to pandas data structures, by Greg Reda `_ - `Pandas and Python: Top 10, by Manish Amde `_ - `Pandas Tutorial, by Mikhail Semeniuk `_