ENH: Effecient resample/reindex by index #7267

shura-v · 2014-05-28T18:24:28Z

Here is the idea. I have predefined index and I'm grouping by timestamps in it, aggregating preceding values of the interval:

index = DatetimeIndex([datetime(2014, 1, 11, 11, 30), datetime(2014, 1, 12, 15), datetime(2014, 1, 15, 22)])

s = Series({
    datetime(2014, 1, 10, 0): 1,
    datetime(2014, 1, 11, 10): 2,
    datetime(2014, 1, 11, 11): 3,
    datetime(2014, 1, 12, 12): 3,
    datetime(2014, 1, 15, 16): 4,
    datetime(2014, 1, 15, 22): 5
})

print s.groupby(lambda x: index[(index >= x).argmax()]).sum()

2014-01-11 11:30:00    6
2014-01-12 15:00:00    3
2014-01-15 22:00:00    9

I don't know how effecient this code, but I think it might be a very useful feature as long as "resample" accepts only offsets or deltas. And "reindex" method doesn't have 'how' argument.

The text was updated successfully, but these errors were encountered:

jreback · 2014-05-28T18:53:57Z

so this is defacto doing this:

Essentially you are grouping by a Categorical give then time

In [75]: def ct(x):
   ....:     return x.hour*3600+x.minute*60+x.second
   ....: 

In [77]: pd.cut(map(ct,s.index.time),bins=[0] + map(ct,index.time))
Out[77]: 
            NaN
     (0, 41400]
     (0, 41400]
 (41400, 54000]
 (54000, 79200]
 (54000, 79200]
Levels (3): Index(['(0, 41400]', '(41400, 54000]', '(54000, 79200]'], dtype=object)

In [78]: s.groupby(pd.cut(map(ct,s.index.time),bins=[0] + map(ct,index.time))).sum()
Out[78]: 
(0, 41400]        5
(41400, 54000]    3
(54000, 79200]    9
dtype: int64

And if we had a properly constructed TimeIndex this would work nicely (ignore edge effects).
related: #4023

shura-v · 2014-05-30T01:19:26Z

Series meant to be something like this:

s = Series({
    datetime(2014, 1, 10, 0): 1,
    datetime(2014, 1, 11, 10): 2,
    datetime(2014, 1, 11, 11): 3,
    datetime(2014, 1, 12, 12): 3,
    datetime(2014, 1, 13, 1): 100,
    datetime(2014, 1, 15, 16): 4,
    datetime(2014, 1, 15, 22): 5
})

But I got the idea, thanks. Anyway, I think it would extremely useful feature that would resample by index (with presence of 'closed', 'label' and 'how' arguments) using some high-level method.

shura-v · 2014-05-30T18:28:52Z

Here is the solution I came up with, based on your suggestion.
Think of index_to as some random (predefined) DatetimeIndex:

index_from = pd.date_range('2009-01-01', '2016-01-01', freq='T')
s = pd.Series(np.arange(len(index_from)), index_from)

index_to = pd.date_range('2010-01-01', '2015-05-30', freq='H')
bins = index_to.insert(0, datetime.fromtimestamp(0)).astype(np.long)
labels = pd.cut(s.index.astype(np.long), bins=bins).labels

agg = s.groupby(labels).last().ix[0:]
pd.Series(agg.values, index_to)

2010-01-01 00:00:00    525600
2010-01-01 01:00:00    525660
2010-01-01 02:00:00    525720
2010-01-01 03:00:00    525780
2010-01-01 04:00:00    525840
2010-01-01 05:00:00    525900
2010-01-01 06:00:00    525960
2010-01-01 07:00:00    526020
2010-01-01 08:00:00    526080
2010-01-01 09:00:00    526140
2010-01-01 10:00:00    526200
2010-01-01 11:00:00    526260
2010-01-01 12:00:00    526320
2010-01-01 13:00:00    526380
2010-01-01 14:00:00    526440
...
2015-05-29 10:00:00    3368760
2015-05-29 11:00:00    3368820
2015-05-29 12:00:00    3368880
2015-05-29 13:00:00    3368940
2015-05-29 14:00:00    3369000
2015-05-29 15:00:00    3369060
2015-05-29 16:00:00    3369120
2015-05-29 17:00:00    3369180
2015-05-29 18:00:00    3369240
2015-05-29 19:00:00    3369300
2015-05-29 20:00:00    3369360
2015-05-29 21:00:00    3369420
2015-05-29 22:00:00    3369480
2015-05-29 23:00:00    3369540
2015-05-30 00:00:00    3369600
Freq: H, Length: 47401

I think it's the same as s.reindex(index_to, method='pad'), but with ability to use any aggregate function instead of just picking last value.

jreback added API Design labels May 28, 2014

jreback added this to the 0.15.0 milestone May 28, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018

mroeschke added Enhancement Groupby and removed API Design Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 11, 2021

mroeschke removed this from the Someday milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Effecient resample/reindex by index #7267

ENH: Effecient resample/reindex by index #7267

shura-v commented May 28, 2014

jreback commented May 28, 2014

shura-v commented May 30, 2014

shura-v commented May 30, 2014

ENH: Effecient resample/reindex by index #7267

ENH: Effecient resample/reindex by index #7267

Comments

shura-v commented May 28, 2014

jreback commented May 28, 2014

shura-v commented May 30, 2014

shura-v commented May 30, 2014