-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Lots of unexpected behavior using resample after groupby #12923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you checkout #12743, which is closing a bunch of these issues, and ensure that it gives you the expected answers? I'm having a bit of trouble understanding your output since the formatting is off, but it looks correct on that branch. Let's move the discussion there if there are any issues. |
So these all look correct to me, as @TomAugspurger says, #12743 will resolve any remaining issues here. In esscense |
So the different behavior we have here : YOU
ME
This will be fixed in next build? Also is it normal B isn't dropped anymore? It seems weird it is dropped for simple functions such as .mean() but not for resampling. |
@BreitA you are probably looking to do this:
The implemenation is exactly this. Yes you are doing an operation on the entire frame, so it makes sense to keep all columns.
|
yeah I know the example is kind of silly (using .mean() for upsampling). The point was that the behavior was not the same by using apply(lambda x:x.resample.mean()) instead of using .resample.mean() |
Code Sample, a copy-pastable example if possible
PANDAS 0.18 code :
PANDAS 0.17 equivalent code:
Expected Output
Pandas 0.18 code Output :
B
0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-02-01 0.0 0.0 0.0 0.0
2014-03-01 0.0 0.0 0.0 0.0
2014-04-01 0.0 0.0 0.0 0.0
2014-05-01 0.0 0.0 0.0 0.0
shape : (10, 4)
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-02-01 0.0 0.0 0.0 0.0
2014-03-01 0.0 0.0 0.0 0.0
2014-04-01 0.0 0.0 0.0 0.0
2014-05-01 0.0 0.0 0.0 0.0
shape : (10, 4)
A C D
B
0.0 0.0 0.0 0.0
1.0 1.0 1.0 1.0
A B C D
B
0.0 2014-01-01 0.0 0.0 0.0 0.0
2014-01-02 0.0 0.0 0.0 0.0
2014-01-03 0.0 0.0 0.0 0.0
2014-01-04 0.0 0.0 0.0 0.0
2014-01-05 0.0 0.0 0.0 0.0
shape : (300, 4)
A B C D
B
0.0 2014-01-01 00:00:00 0.0 0.0 0.0 0.0
2014-01-01 01:00:00 NaN NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN NaN
shape : (7154, 4)
pd version 0.18.0
Pandas 0.17 equivalent code Output :
A C D
B
0 0 0 0
1 1 1 1
A C D
B
0 2014-01-01 0 0 0
2014-02-01 0 0 0
2014-03-01 0 0 0
2014-04-01 0 0 0
2014-05-01 0 0 0
shape : (10, 3)
A B C D
B
0 2014-01-01 0 0 0 0
2014-02-01 0 0 0 0
2014-03-01 0 0 0 0
2014-04-01 0 0 0 0
2014-05-01 0 0 0 0
shape : (10, 4)
A C D
B
0 0 0 0
1 1 1 1
A C D
B
0 2014-01-01 00:00:00 0 0 0
2014-01-01 01:00:00 NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN
shape : (7154, 3)
A B C D
B
0 2014-01-01 00:00:00 0 0 0 0
2014-01-01 01:00:00 NaN NaN NaN NaN
2014-01-01 02:00:00 NaN NaN NaN NaN
2014-01-01 03:00:00 NaN NaN NaN NaN
2014-01-01 04:00:00 NaN NaN NaN NaN
shape : (7154, 4)
pd version 0.17.1
ISSUES :
in pandas 0.18.0 the column B is not dropped when applying resample afterwards (it should be dropped and put in index like with the simple example using .mean() after groupby).
in pandas 0.18.0 the behavior is correct when downsampling (example with 'MS') but is wrong when upsampling (example with 'H') The dataframe is not upsampled in that case and stays at freq='D'
A workaround is to use df.groupby('B').apply(lambda x: x.resample.mean()) but it's inelegant to say the least and does not solve the issue of B being not dropped in columns.
The text was updated successfully, but these errors were encountered: