-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
DOC: update the DataFrame.reindex_like docstring #22775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 10 commits
bab3a2b
cd3ec62
1ee2751
2cbf5dd
92c1d2f
5bcee6d
2729193
4851b91
be1a774
7072e4a
88e6e37
38c5c94
4518791
9d58d4e
363e8c0
4d59844
6bf4977
2a838f2
7df1f79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3909,43 +3909,59 @@ def shift(self, periods=1, freq=None, axis=0): | |
def set_index(self, keys, drop=True, append=False, inplace=False, | ||
verify_integrity=False): | ||
""" | ||
An index is created with existing columns. | ||
|
||
Set the DataFrame index (row labels) using one or more existing | ||
columns. By default yields a new object. | ||
columns. The index can replace the existing index or expand on it. | ||
|
||
Parameters | ||
---------- | ||
keys : column label or list of column labels / arrays | ||
drop : boolean, default True | ||
Delete columns to be used as the new index | ||
append : boolean, default False | ||
Whether to append columns to existing index | ||
inplace : boolean, default False | ||
Modify the DataFrame in place (do not create a new object) | ||
verify_integrity : boolean, default False | ||
keys : str or list of str or array | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be strict, the keys can be something else as a string ... (column names can also be integers, timestamps, ..) |
||
Column label or list of column labels / arrays that will | ||
form the new index. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this 'arrays' option is really something completely different (it are the actual index values passed as an array, not referring to one of the existing columns), I would put that in a separate sentence. |
||
drop : bool, default True | ||
Delete columns to be used as the new index. | ||
append : bool, default False | ||
Whether to append columns to existing index. | ||
inplace : bool, default False | ||
Modify the DataFrame in place (do not create a new object). | ||
verify_integrity : bool, default False | ||
Check the new index for duplicates. Otherwise defer the check until | ||
necessary. Setting to False will improve the performance of this | ||
method | ||
method. | ||
|
||
Returns | ||
------- | ||
DataFrame | ||
Changed row labels. | ||
|
||
See Also | ||
-------- | ||
DataFrame.reset_index : Opposite of set_index. | ||
DataFrame.reindex : Change to new indices or expand indices. | ||
DataFrame.reindex_like : Change to same indices as other DataFrame. | ||
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame({'month': [1, 4, 7, 10], | ||
... 'year': [2012, 2014, 2013, 2014], | ||
... 'sale':[55, 40, 84, 31]}) | ||
month sale year | ||
0 1 55 2012 | ||
1 4 40 2014 | ||
2 7 84 2013 | ||
3 10 31 2014 | ||
... 'sale': [55, 40, 84, 31]}) | ||
math-and-data marked this conversation as resolved.
Show resolved
Hide resolved
|
||
>>> df | ||
month year sale | ||
0 1 2012 55 | ||
1 4 2014 40 | ||
2 7 2013 84 | ||
3 10 2014 31 | ||
|
||
Set the index to become the 'month' column: | ||
|
||
>>> df.set_index('month') | ||
sale year | ||
year sale | ||
month | ||
1 55 2012 | ||
4 40 2014 | ||
7 84 2013 | ||
10 31 2014 | ||
1 2012 55 | ||
4 2014 40 | ||
7 2013 84 | ||
10 2014 31 | ||
|
||
Create a multi-index using columns 'year' and 'month': | ||
|
||
|
@@ -3966,10 +3982,6 @@ def set_index(self, keys, drop=True, append=False, inplace=False, | |
2 2014 4 40 | ||
3 2013 7 84 | ||
4 2014 10 31 | ||
|
||
Returns | ||
------- | ||
dataframe : DataFrame | ||
""" | ||
inplace = validate_bool_kwarg(inplace, 'inplace') | ||
if not isinstance(keys, list): | ||
|
@@ -4037,6 +4049,8 @@ def set_index(self, keys, drop=True, append=False, inplace=False, | |
def reset_index(self, level=None, drop=False, inplace=False, col_level=0, | ||
col_fill=''): | ||
""" | ||
An existing index is modified. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can also do better here for the summary method, as just from this line "An existing index is modified.", a user won't get much clue of what the method is doing. But of course, given that the method is doing different things, it might be difficult to have something that is still generally true but less vague as the above ... Trying to think of better descriptions:
(will think further on it) |
||
|
||
For DataFrame with multi-level index, return new DataFrame with | ||
labeling information in the columns under the index names, defaulting | ||
to 'level_0', 'level_1', etc. if any are None. For a standard index, | ||
|
@@ -4047,12 +4061,12 @@ def reset_index(self, level=None, drop=False, inplace=False, col_level=0, | |
---------- | ||
level : int, str, tuple, or list, default None | ||
Only remove the given levels from the index. Removes all levels by | ||
default | ||
drop : boolean, default False | ||
default. | ||
drop : bool, default False | ||
Do not try to insert index into dataframe columns. This resets | ||
the index to the default integer index. | ||
inplace : boolean, default False | ||
Modify the DataFrame in place (do not create a new object) | ||
inplace : bool, default False | ||
Modify the DataFrame in place (do not create a new object). | ||
col_level : int or str, default 0 | ||
If the columns have multiple levels, determines which level the | ||
labels are inserted into. By default it is inserted into the first | ||
|
@@ -4063,7 +4077,14 @@ def reset_index(self, level=None, drop=False, inplace=False, col_level=0, | |
|
||
Returns | ||
------- | ||
resetted : DataFrame | ||
DataFrame | ||
Changed row labels. | ||
|
||
See Also | ||
-------- | ||
DataFrame.set_index : Opposite of reset_index. | ||
DataFrame.reindex : Change to new indices or expand indices. | ||
DataFrame.reindex_like : Change to same indices as other DataFrame. | ||
|
||
Examples | ||
-------- | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -3316,29 +3316,84 @@ def select(self, crit, axis=0): | |||||
|
||||||
def reindex_like(self, other, method=None, copy=True, limit=None, | ||||||
tolerance=None): | ||||||
"""Return an object with matching indices to myself. | ||||||
""" | ||||||
Return an object with matching indices as other object. | ||||||
|
||||||
Conform the object to the same index on all axes. Optional | ||||||
filling logic, placing NA/NaN in locations having no value | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just say |
||||||
in the previous index. A new object is produced unless the | ||||||
new index is equivalent to the current one and copy=False. | ||||||
|
||||||
Parameters | ||||||
---------- | ||||||
other : Object | ||||||
method : string or None | ||||||
copy : boolean, default True | ||||||
other : Object of the same data type | ||||||
Its row and column indices are used to define the new indices | ||||||
of this object. | ||||||
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'} | ||||||
math-and-data marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
Method to use for filling holes in reindexed DataFrame. | ||||||
Please note: this is only applicable to DataFrames/Series with a | ||||||
monotonically increasing/decreasing index. | ||||||
|
||||||
* None (default): don't fill gaps | ||||||
* pad / ffill: propagate last valid observation forward to next | ||||||
valid | ||||||
* backfill / bfill: use next valid observation to fill gap | ||||||
* nearest: use nearest valid observations to fill gap | ||||||
|
||||||
copy : bool, default True | ||||||
Return a new object, even if the passed indexes are the same. | ||||||
limit : int, default None | ||||||
Maximum number of consecutive labels to fill for inexact matches. | ||||||
tolerance : optional | ||||||
Maximum distance between labels of the other object and this | ||||||
object for inexact matches. Can be list-like. | ||||||
Maximum distance between original and new labels for inexact | ||||||
matches. The values of the index at the matching locations most | ||||||
satisfy the equation ``abs(index[indexer] - target) <= tolerance``. | ||||||
|
||||||
Tolerance may be a scalar value, which applies the same tolerance | ||||||
to all values, or list-like, which applies variable tolerance per | ||||||
element. List-like includes list, tuple, array, Series, and must be | ||||||
the same size as the index and its dtype must exactly match the | ||||||
index's type. | ||||||
|
||||||
.. versionadded:: 0.21.0 (list-like tolerance) | ||||||
|
||||||
Notes | ||||||
----- | ||||||
Like calling s.reindex(index=other.index, columns=other.columns, | ||||||
method=...) | ||||||
Like calling `.reindex(index=other.index, columns=other.columns,...)`. | ||||||
math-and-data marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Returns | ||||||
------- | ||||||
reindexed : same as input | ||||||
Same object type as input, but with changed indices on each axis. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a line before that with |
||||||
|
||||||
See Also | ||||||
-------- | ||||||
DataFrame.set_index : Set row labels. | ||||||
DataFrame.reset_index : Remove row labels or move them to new columns. | ||||||
DataFrame.reindex : Change to new indices or expand indices. | ||||||
|
||||||
Examples | ||||||
-------- | ||||||
>>> df_weather_station_1 = pd.DataFrame([[24.3, 75.7, 'high'], | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think the |
||||||
... [31, 87.8, 'high'], | ||||||
... [22, 71.6, 'medium'], | ||||||
... [35, 95, 'medium']], | ||||||
... columns=['temp_celsius', 'temp_fahrenheit', 'windspeed'], | ||||||
... index=pd.date_range(start='2014-02-12', | ||||||
... end='2014-02-15', freq='D')) | ||||||
|
||||||
>>> df_weather_station_2 = pd.DataFrame([[28, 'low'], | ||||||
... [30, 'low'], | ||||||
... [35.1, 'medium']], | ||||||
... columns=['temp_celsius', 'windspeed'], | ||||||
... index=pd.DatetimeIndex(['2014-02-12', '2014-02-13', | ||||||
... '2014-02-15'])) | ||||||
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
>>> df_weather_station_2.reindex_like(df_weather_station_1) | ||||||
temp_celsius temp_fahrenheit windspeed | ||||||
2014-02-12 28.0 NaN low | ||||||
2014-02-13 30.0 NaN low | ||||||
2014-02-14 NaN NaN NaN | ||||||
2014-02-15 35.1 NaN medium | ||||||
""" | ||||||
d = other._construct_axes_dict(axes=self._AXIS_ORDERS, method=method, | ||||||
copy=copy, limit=limit, | ||||||
|
@@ -3705,7 +3760,7 @@ def reindex(self, *args, **kwargs): | |||||
Conform %(klass)s to new index with optional filling logic, placing | ||||||
NA/NaN in locations having no value in the previous index. A new object | ||||||
is produced unless the new index is equivalent to the current one and | ||||||
copy=False | ||||||
copy=False. | ||||||
math-and-data marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
Parameters | ||||||
---------- | ||||||
|
@@ -3714,27 +3769,27 @@ def reindex(self, *args, **kwargs): | |||||
New labels / index to conform to. Preferably an Index object to | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you move the |
||||||
avoid duplicating data | ||||||
%(optional_axis)s | ||||||
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional | ||||||
method to use for filling holes in reindexed DataFrame. | ||||||
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'} | ||||||
Method to use for filling holes in reindexed DataFrame. | ||||||
Please note: this is only applicable to DataFrames/Series with a | ||||||
monotonically increasing/decreasing index. | ||||||
|
||||||
* default: don't fill gaps | ||||||
* None (default): don't fill gaps | ||||||
* pad / ffill: propagate last valid observation forward to next | ||||||
valid | ||||||
* backfill / bfill: use next valid observation to fill gap | ||||||
* nearest: use nearest valid observations to fill gap | ||||||
|
||||||
copy : boolean, default True | ||||||
Return a new object, even if the passed indexes are the same | ||||||
copy : bool, default True | ||||||
Return a new object, even if the passed indexes are the same. | ||||||
level : int or name | ||||||
Broadcast across a level, matching Index values on the | ||||||
passed MultiIndex level | ||||||
passed MultiIndex level. | ||||||
fill_value : scalar, default np.NaN | ||||||
Value to use for missing values. Defaults to NaN, but can be any | ||||||
"compatible" value | ||||||
"compatible" value. | ||||||
limit : int, default None | ||||||
Maximum number of consecutive elements to forward or backward fill | ||||||
Maximum number of consecutive elements to forward or backward fill. | ||||||
tolerance : optional | ||||||
Maximum distance between original and new labels for inexact | ||||||
matches. The values of the index at the matching locations most | ||||||
|
@@ -3748,6 +3803,12 @@ def reindex(self, *args, **kwargs): | |||||
|
||||||
.. versionadded:: 0.21.0 (list-like tolerance) | ||||||
|
||||||
See Also | ||||||
-------- | ||||||
DataFrame.set_index : Set row labels. | ||||||
DataFrame.reset_index : Remove row labels or move them to new columns. | ||||||
DataFrame.reindex_like : Change to same indices as other DataFrame. | ||||||
|
||||||
Examples | ||||||
-------- | ||||||
|
||||||
|
@@ -3839,12 +3900,12 @@ def reindex(self, *args, **kwargs): | |||||
... index=date_index) | ||||||
>>> df2 | ||||||
prices | ||||||
2010-01-01 100 | ||||||
2010-01-02 101 | ||||||
2010-01-01 100.0 | ||||||
2010-01-02 101.0 | ||||||
2010-01-03 NaN | ||||||
2010-01-04 100 | ||||||
2010-01-05 89 | ||||||
2010-01-06 88 | ||||||
2010-01-04 100.0 | ||||||
2010-01-05 89.0 | ||||||
2010-01-06 88.0 | ||||||
|
||||||
Suppose we decide to expand the dataframe to cover a wider | ||||||
date range. | ||||||
|
@@ -3855,12 +3916,12 @@ def reindex(self, *args, **kwargs): | |||||
2009-12-29 NaN | ||||||
2009-12-30 NaN | ||||||
2009-12-31 NaN | ||||||
2010-01-01 100 | ||||||
2010-01-02 101 | ||||||
2010-01-01 100.0 | ||||||
2010-01-02 101.0 | ||||||
2010-01-03 NaN | ||||||
2010-01-04 100 | ||||||
2010-01-05 89 | ||||||
2010-01-06 88 | ||||||
2010-01-04 100.0 | ||||||
2010-01-05 89.0 | ||||||
2010-01-06 88.0 | ||||||
2010-01-07 NaN | ||||||
|
||||||
The index entries that did not have a value in the original data frame | ||||||
|
@@ -3873,15 +3934,15 @@ def reindex(self, *args, **kwargs): | |||||
|
||||||
>>> df2.reindex(date_index2, method='bfill') | ||||||
prices | ||||||
2009-12-29 100 | ||||||
2009-12-30 100 | ||||||
2009-12-31 100 | ||||||
2010-01-01 100 | ||||||
2010-01-02 101 | ||||||
2009-12-29 100.0 | ||||||
2009-12-30 100.0 | ||||||
2009-12-31 100.0 | ||||||
2010-01-01 100.0 | ||||||
2010-01-02 101.0 | ||||||
2010-01-03 NaN | ||||||
2010-01-04 100 | ||||||
2010-01-05 89 | ||||||
2010-01-06 88 | ||||||
2010-01-04 100.0 | ||||||
2010-01-05 89.0 | ||||||
2010-01-06 88.0 | ||||||
2010-01-07 NaN | ||||||
|
||||||
Please note that the ``NaN`` value present in the original dataframe | ||||||
|
@@ -3967,11 +4028,10 @@ def _needs_reindex_multi(self, axes, method, level): | |||||
def _reindex_multi(self, axes, copy, fill_value): | ||||||
return NotImplemented | ||||||
|
||||||
_shared_docs[ | ||||||
'reindex_axis'] = ("""Conform input object to new index with optional | ||||||
filling logic, placing NA/NaN in locations having no value in the | ||||||
previous index. A new object is produced unless the new index is | ||||||
equivalent to the current one and copy=False | ||||||
_shared_docs['reindex_axis'] = ("""Conform input object to new index | ||||||
math-and-data marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
with optional filling logic, placing NA/NaN in locations having | ||||||
no value in the previous index. A new object is produced unless | ||||||
the new index is equivalent to the current one and copy=False. | ||||||
|
||||||
Parameters | ||||||
---------- | ||||||
|
@@ -4008,17 +4068,20 @@ def _reindex_multi(self, axes, copy, fill_value): | |||||
|
||||||
.. versionadded:: 0.21.0 (list-like tolerance) | ||||||
|
||||||
Examples | ||||||
-------- | ||||||
>>> df.reindex_axis(['A', 'B', 'C'], axis=1) | ||||||
|
||||||
See Also | ||||||
-------- | ||||||
reindex, reindex_like | ||||||
DataFrame.set_index : set row labels | ||||||
DataFrame.reset_index : remove row labels or move them to new columns | ||||||
DataFrame.reindex : change to new indices or expand indices | ||||||
DataFrame.reindex_like : change to same indices as other DataFrame | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you capitalize and finish with period these descriptions. |
||||||
|
||||||
Returns | ||||||
------- | ||||||
reindexed : %(klass)s | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
And add a short description of what is being returned in the next line. |
||||||
|
||||||
Examples | ||||||
-------- | ||||||
>>> df.reindex_axis(['A', 'B', 'C'], axis=1) | ||||||
math-and-data marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
""") | ||||||
|
||||||
@Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs) | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit confusing as the summary line (because it seems to suggest that an index is returned, since that is created).
I found the previous "Set the DataFrame index using existing columns" a bit clearer. @datapythonista thoughts?