-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: pd.Series.shift and .diff to accept a collection of numbers #44424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
+1 - This is pretty easy to accomplish today, but pandas accepts lists in many places like this and it feels like a natural extension of that. A few thoughts if we do go forward with this:
|
Would DataFrame.shift also need to accept an iterable? |
I would say for the sake of consistency both series and dataframe should accept the iterable. This raises the question of column order for the Dataframe case. |
this should just be a series operation only or if ina frame would have to generate a multi index |
You mean, have either lags or columns on the index? |
For Series, what would the columns be labeled? Is there another Series method that can take multiple inputs and act as a transformer? I am only aware of reducers (e.g. quantile). |
I would label them as some function of the shift and the original name. E.g. |
xref #19298 |
take |
Thanks @jbrockmendel - did not realize this was a duplicate. A lot more discussion in here, so I think maybe leave this one open and close the other?
This would work well for string / integer labels, but could produce odd results if other Python objects are used. Perhaps that's not a case we should worry too much about though. |
Fine by me |
From #19298: Maybe take an optional prefix, e.g. |
That sounds good to me |
But need to handle the pd.DataFrame case, where you also have column names. |
A few clarification questions:
|
@GitChein - I'm guessing you're thinking of the list of shifts to be zipped with the columns. My first intuition would be that shifting a DataFrame would instead apply each shift to every column. E.g.
produces
This is similar to quantile: |
Yes, each shift should be applied to each column IMO.
|
@jona-sassenhagen @rhshadrach From my and @GitChein 's understanding, we want the function to essentially apply consecutive shifts and then return a resulting DataFrame, correct? Also, if the input is an empty list, should we keep the default functionality of returning a shift by 1, or should we just return the original DataFrame/Series? I personally think returning the original frame makes more sense since we pass an empty list rather than using the default value for the period parameter. |
Hey all, We made a potential fix and ran ci/code_checks.sh and test_shift.py for both frame and series and they all pass. Below is what the current output looks like in our branch: DataFrame
Series
Some Notes:
If this is suitable for the needs of the issue, then we will either create some tests or create a pull request. Whichever is more appropriate. This is my and @skwirskj first time contributing so apologies for any misunderstanding of the process! |
Is your feature request related to a problem?
No
Describe the solution you'd like
pd.DataFrame.shift
/pd.Series.shift
could alternatively accept a list of ints or a collection of numbers, and the result would be a dataframe with one column per shift. This would make it very convenient to e.g. create multiple lagged versions of a column for time series problems.E.g.,
would give me a dataframe with 7 columns, each shifted versions of
time_series_df[my_time_feature]
.API breaking implications
The first input to
df.shift
would be overloaded to also accept a list of ints.Describe alternatives you've considered
Alternatives are very simple to do, you just iterate over the lags and concatenate or join.
This would just be a handy shortcut.
Additional context
I'd in principle be open to implementing this myself, although I've never contributed to pandas.
The text was updated successfully, but these errors were encountered: