-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Sort()ing then selecting columns in a function apply()d to a grouped DataFrame #10671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
is their a reason you are not just doing
what are you expecting this to do? |
Yes. In trying to make the smallest piece of code that would still produce the error, I made a script that doesn't actually do anything. What I'm actually trying to accomplish looks more like: import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'bar'],
'B' : [0,0,3,3,2,2,1,1]})
def function(group):
group.sort(columns = 'B', inplace=True)
group['C'] = group['B'].diff()
trimedGroup = group[['A', 'C']]
return trimedGroup
df2 = df.groupby(['A']).apply(function) And I expect to get back: In[28]:df2
Out[28]:
A E
0 foo NaN
1 bar NaN
2 foo 1
3 bar 1
4 foo 1
5 bar 1
6 foo 1
7 bar 1 I want to take the diff() of sort()ed groups. |
I think as
|
You NEVER want to sort in a group, simply sort the entire frame beforehand. In fact you always want to do as much work as possible in a vectorized fashion. |
Duplicate of #19437 |
This code:
produces this error:
It's the combination of the sort and the column selecting inside of a grouped apply that causes the problem. I'm happy to give more detail on why I want to do this, but this is the simplest most striped down code that produces the error.
The text was updated successfully, but these errors were encountered: