-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Incorrect resampling from hourly to monthly values #2665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you leave out the closed/label arguments, pandas will choose the right one for you:
edit: I thought it about it some more and am not certain this is a bug. If you're thinking about it terms of monthly periods, you should leave out the If you want explicit control, because this is timestamp format, the monthend datetime is midnight of the last day of the month. This is so it'll play nice with daily frequencies. It's possible to hack it so IF the input data is intraday frequency then we adjust the bin edges, but it is often the case that you might have intraday data without an explicit frequency. So the result would be inconsistent behavior. |
I'm moving this to 0.10.2 in case more discussion is needed/desired |
Isn't the above example results wrong - the dates indicated are actually midnight of the start of the relevant date. Is not the correct range for March 2000: 2000-03-01 0:00 (left) I would have thought mathematically, saying that a monthly period finishes on 2000-03-31 is incorrect, since this represents the midnight between 2000-03-30 and 2000-03-31, not the midnight after 2000-03-31 (which is in fact 2000-04-01 0:00). ? |
Well that's look at this here:
So I think the Closing this issue ... |
This issue keeps on coming back... in my eyes this is not yet ready to be closed... @changhiskhan, leaving out the @wesm, the Instead, I believe the desired behavior should be:
I did some research into the code... Actually, the behavior for source data with start time stamp labels previously had a very similar problem (see http://stackoverflow.com/questions/11018120/bug-in-resampling-with-pandas-0-8). It was handled by introducing the following code (#1471):
of which the Shouldn't the correct code be:
That indeed results in the desired behavior shown above and also solves #5440. Though that raises again issue #1726... I think #1726 should be solved separately, maybe by adjusting the range edge |
why don't u do a PR with the new code and some tests to validate? |
It also involves changing the default behavior of Changing the default to |
I'm afraid a PR is too much for me... I'm not accustomed with the PR process yet (and I also lack the time). Moreover, a solution for #1726 (and probably other issues) still needs to be found. As a side note: the resample code looks to me quite complicated with already many hacks. Instead of adding another hack, maybe a profound (and time-consuming) code refactoring is desirable? |
ok will reopen this |
This looks correct to me If we have dates = pd.date_range('3/20/2000', '5/20/2000', freq='1h')
ts = pd.Series(range(len(dates)), index=dates) and we do
If we do dates = pd.date_range('3/20/2000', '5/20/2000', freq='1h')
ts = pd.Series(range(len(dates)), index=dates)
bins = [
('2000-02-29', '2000-03-31'),
('2000-03-31', '2000-04-30'),
('2000-04-30', '2000-05-31'),
]
# left, right
result = []
for left, right in bins:
df = ts[(ts.index.normalize()>left)&(ts.index.normalize()<=right)].copy()
result.append(df.sum())
print(result)
print(ts.resample('1M', label='left', closed='right').sum()) then I'm marking as 'closing candidate' for now then, but I'll double-check later |
closing then, but please do let me know if I've misunderstood |
EDIT: Modern reproducible example:
Expected behavior: #2665 (comment)
There seems to be something wrong when resampling hourly values to monthly values. The 'closed=' argument does not do what it should. I am using pandas 0.10.0.
When resampling these hourly to daily values, there was no problem.
The text was updated successfully, but these errors were encountered: