Skip to content

DOC: update the DataFrame.reindex_like docstring #22775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Nov 26, 2018
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 51 additions & 30 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3909,43 +3909,59 @@ def shift(self, periods=1, freq=None, axis=0):
def set_index(self, keys, drop=True, append=False, inplace=False,
verify_integrity=False):
"""
An index is created with existing columns.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a bit confusing as the summary line (because it seems to suggest that an index is returned, since that is created).
I found the previous "Set the DataFrame index using existing columns" a bit clearer. @datapythonista thoughts?


Set the DataFrame index (row labels) using one or more existing
columns. By default yields a new object.
columns. The index can replace the existing index or expand on it.

Parameters
----------
keys : column label or list of column labels / arrays
drop : boolean, default True
Delete columns to be used as the new index
append : boolean, default False
Whether to append columns to existing index
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
verify_integrity : boolean, default False
keys : str or list of str or array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be strict, the keys can be something else as a string ... (column names can also be integers, timestamps, ..)
That also the reason that there was 'label' before.

Column label or list of column labels / arrays that will
form the new index.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this 'arrays' option is really something completely different (it are the actual index values passed as an array, not referring to one of the existing columns), I would put that in a separate sentence.

drop : bool, default True
Delete columns to be used as the new index.
append : bool, default False
Whether to append columns to existing index.
inplace : bool, default False
Modify the DataFrame in place (do not create a new object).
verify_integrity : bool, default False
Check the new index for duplicates. Otherwise defer the check until
necessary. Setting to False will improve the performance of this
method
method.

Returns
-------
DataFrame
Changed row labels.

See Also
--------
DataFrame.reset_index : Opposite of set_index.
DataFrame.reindex : Change to new indices or expand indices.
DataFrame.reindex_like : Change to same indices as other DataFrame.

Examples
--------
>>> df = pd.DataFrame({'month': [1, 4, 7, 10],
... 'year': [2012, 2014, 2013, 2014],
... 'sale':[55, 40, 84, 31]})
month sale year
0 1 55 2012
1 4 40 2014
2 7 84 2013
3 10 31 2014
... 'sale': [55, 40, 84, 31]})
>>> df
month year sale
0 1 2012 55
1 4 2014 40
2 7 2013 84
3 10 2014 31

Set the index to become the 'month' column:

>>> df.set_index('month')
sale year
year sale
month
1 55 2012
4 40 2014
7 84 2013
10 31 2014
1 2012 55
4 2014 40
7 2013 84
10 2014 31

Create a multi-index using columns 'year' and 'month':

Expand All @@ -3966,10 +3982,6 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
2 2014 4 40
3 2013 7 84
4 2014 10 31

Returns
-------
dataframe : DataFrame
"""
inplace = validate_bool_kwarg(inplace, 'inplace')
if not isinstance(keys, list):
Expand Down Expand Up @@ -4037,6 +4049,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False,
def reset_index(self, level=None, drop=False, inplace=False, col_level=0,
col_fill=''):
"""
An existing index is modified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also do better here for the summary method, as just from this line "An existing index is modified.", a user won't get much clue of what the method is doing.

But of course, given that the method is doing different things, it might be difficult to have something that is still generally true but less vague as the above ...

Trying to think of better descriptions:

Remove index (level) (but that is also a bit short / cryptic)
Reset index to default index or remove index level

(will think further on it)


For DataFrame with multi-level index, return new DataFrame with
labeling information in the columns under the index names, defaulting
to 'level_0', 'level_1', etc. if any are None. For a standard index,
Expand All @@ -4047,12 +4061,12 @@ def reset_index(self, level=None, drop=False, inplace=False, col_level=0,
----------
level : int, str, tuple, or list, default None
Only remove the given levels from the index. Removes all levels by
default
drop : boolean, default False
default.
drop : bool, default False
Do not try to insert index into dataframe columns. This resets
the index to the default integer index.
inplace : boolean, default False
Modify the DataFrame in place (do not create a new object)
inplace : bool, default False
Modify the DataFrame in place (do not create a new object).
col_level : int or str, default 0
If the columns have multiple levels, determines which level the
labels are inserted into. By default it is inserted into the first
Expand All @@ -4063,7 +4077,14 @@ def reset_index(self, level=None, drop=False, inplace=False, col_level=0,

Returns
-------
resetted : DataFrame
DataFrame
Changed row labels.

See Also
--------
DataFrame.set_index : Opposite of reset_index.
DataFrame.reindex : Change to new indices or expand indices.
DataFrame.reindex_like : Change to same indices as other DataFrame.

Examples
--------
Expand Down
155 changes: 109 additions & 46 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -3316,29 +3316,84 @@ def select(self, crit, axis=0):

def reindex_like(self, other, method=None, copy=True, limit=None,
tolerance=None):
"""Return an object with matching indices to myself.
"""
Return an object with matching indices as other object.

Conform the object to the same index on all axes. Optional
filling logic, placing NA/NaN in locations having no value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just say NaN

in the previous index. A new object is produced unless the
new index is equivalent to the current one and copy=False.

Parameters
----------
other : Object
method : string or None
copy : boolean, default True
other : Object of the same data type
Its row and column indices are used to define the new indices
of this object.
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.

* None (default): don't fill gaps
* pad / ffill: propagate last valid observation forward to next
valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap

copy : bool, default True
Return a new object, even if the passed indexes are the same.
limit : int, default None
Maximum number of consecutive labels to fill for inexact matches.
tolerance : optional
Maximum distance between labels of the other object and this
object for inexact matches. Can be list-like.
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
satisfy the equation ``abs(index[indexer] - target) <= tolerance``.

Tolerance may be a scalar value, which applies the same tolerance
to all values, or list-like, which applies variable tolerance per
element. List-like includes list, tuple, array, Series, and must be
the same size as the index and its dtype must exactly match the
index's type.

.. versionadded:: 0.21.0 (list-like tolerance)

Notes
-----
Like calling s.reindex(index=other.index, columns=other.columns,
method=...)
Like calling `.reindex(index=other.index, columns=other.columns,...)`.

Returns
-------
reindexed : same as input
Same object type as input, but with changed indices on each axis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a line before that with Series or DataFrame. This description should be indented in the next line.


See Also
--------
DataFrame.set_index : Set row labels.
DataFrame.reset_index : Remove row labels or move them to new columns.
DataFrame.reindex : Change to new indices or expand indices.

Examples
--------
>>> df_weather_station_1 = pd.DataFrame([[24.3, 75.7, 'high'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the _weather_station bit adds any value; would be OK to just use df1, df2, etc...

... [31, 87.8, 'high'],
... [22, 71.6, 'medium'],
... [35, 95, 'medium']],
... columns=['temp_celsius', 'temp_fahrenheit', 'windspeed'],
... index=pd.date_range(start='2014-02-12',
... end='2014-02-15', freq='D'))

>>> df_weather_station_2 = pd.DataFrame([[28, 'low'],
... [30, 'low'],
... [35.1, 'medium']],
... columns=['temp_celsius', 'windspeed'],
... index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
... '2014-02-15']))

>>> df_weather_station_2.reindex_like(df_weather_station_1)
temp_celsius temp_fahrenheit windspeed
2014-02-12 28.0 NaN low
2014-02-13 30.0 NaN low
2014-02-14 NaN NaN NaN
2014-02-15 35.1 NaN medium
"""
d = other._construct_axes_dict(axes=self._AXIS_ORDERS, method=method,
copy=copy, limit=limit,
Expand Down Expand Up @@ -3705,7 +3760,7 @@ def reindex(self, *args, **kwargs):
Conform %(klass)s to new index with optional filling logic, placing
NA/NaN in locations having no value in the previous index. A new object
is produced unless the new index is equivalent to the current one and
copy=False
copy=False.

Parameters
----------
Expand All @@ -3714,27 +3769,27 @@ def reindex(self, *args, **kwargs):
New labels / index to conform to. Preferably an Index object to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the (should be specified using keywords) to the description. As a normal sentence, doesn't need to be in brackets.

avoid duplicating data
%(optional_axis)s
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
method to use for filling holes in reindexed DataFrame.
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}
Method to use for filling holes in reindexed DataFrame.
Please note: this is only applicable to DataFrames/Series with a
monotonically increasing/decreasing index.

* default: don't fill gaps
* None (default): don't fill gaps
* pad / ffill: propagate last valid observation forward to next
valid
* backfill / bfill: use next valid observation to fill gap
* nearest: use nearest valid observations to fill gap

copy : boolean, default True
Return a new object, even if the passed indexes are the same
copy : bool, default True
Return a new object, even if the passed indexes are the same.
level : int or name
Broadcast across a level, matching Index values on the
passed MultiIndex level
passed MultiIndex level.
fill_value : scalar, default np.NaN
Value to use for missing values. Defaults to NaN, but can be any
"compatible" value
"compatible" value.
limit : int, default None
Maximum number of consecutive elements to forward or backward fill
Maximum number of consecutive elements to forward or backward fill.
tolerance : optional
Maximum distance between original and new labels for inexact
matches. The values of the index at the matching locations most
Expand All @@ -3748,6 +3803,12 @@ def reindex(self, *args, **kwargs):

.. versionadded:: 0.21.0 (list-like tolerance)

See Also
--------
DataFrame.set_index : Set row labels.
DataFrame.reset_index : Remove row labels or move them to new columns.
DataFrame.reindex_like : Change to same indices as other DataFrame.

Examples
--------

Expand Down Expand Up @@ -3839,12 +3900,12 @@ def reindex(self, *args, **kwargs):
... index=date_index)
>>> df2
prices
2010-01-01 100
2010-01-02 101
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100
2010-01-05 89
2010-01-06 88
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0

Suppose we decide to expand the dataframe to cover a wider
date range.
Expand All @@ -3855,12 +3916,12 @@ def reindex(self, *args, **kwargs):
2009-12-29 NaN
2009-12-30 NaN
2009-12-31 NaN
2010-01-01 100
2010-01-02 101
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100
2010-01-05 89
2010-01-06 88
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN

The index entries that did not have a value in the original data frame
Expand All @@ -3873,15 +3934,15 @@ def reindex(self, *args, **kwargs):

>>> df2.reindex(date_index2, method='bfill')
prices
2009-12-29 100
2009-12-30 100
2009-12-31 100
2010-01-01 100
2010-01-02 101
2009-12-29 100.0
2009-12-30 100.0
2009-12-31 100.0
2010-01-01 100.0
2010-01-02 101.0
2010-01-03 NaN
2010-01-04 100
2010-01-05 89
2010-01-06 88
2010-01-04 100.0
2010-01-05 89.0
2010-01-06 88.0
2010-01-07 NaN

Please note that the ``NaN`` value present in the original dataframe
Expand Down Expand Up @@ -3967,11 +4028,10 @@ def _needs_reindex_multi(self, axes, method, level):
def _reindex_multi(self, axes, copy, fill_value):
return NotImplemented

_shared_docs[
'reindex_axis'] = ("""Conform input object to new index with optional
filling logic, placing NA/NaN in locations having no value in the
previous index. A new object is produced unless the new index is
equivalent to the current one and copy=False
_shared_docs['reindex_axis'] = ("""Conform input object to new index
with optional filling logic, placing NA/NaN in locations having
no value in the previous index. A new object is produced unless
the new index is equivalent to the current one and copy=False.

Parameters
----------
Expand Down Expand Up @@ -4008,17 +4068,20 @@ def _reindex_multi(self, axes, copy, fill_value):

.. versionadded:: 0.21.0 (list-like tolerance)

Examples
--------
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)

See Also
--------
reindex, reindex_like
DataFrame.set_index : set row labels
DataFrame.reset_index : remove row labels or move them to new columns
DataFrame.reindex : change to new indices or expand indices
DataFrame.reindex_like : change to same indices as other DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you capitalize and finish with period these descriptions.


Returns
-------
reindexed : %(klass)s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reindexed : %(klass)s
%(klass)s

And add a short description of what is being returned in the next line.


Examples
--------
>>> df.reindex_axis(['A', 'B', 'C'], axis=1)
""")

@Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)
Expand Down