-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH ohlc resample for DataFrame #4740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -430,6 +430,13 @@ def ohlc(self): | |||
|
|||
For multiple groupings, the result index will be a MultiIndex | |||
""" | |||
if isinstance(self.obj, com.ABCDataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mke this a separate method so that if in the future we define multiple aggregators like this can be easily used
here's another one.... df.groupby('A').describe()
(not defined by pretty easy to do!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groupby is a crazy place (not sure where this should go), but I see you're point, it ought to be refactored out of there... Are you suggesting just a method like this:
def _apply_to_column_groupbys(self, func):
from pandas.tools.merge import concat
return concat((func(col_groupby)
for _, col_groupby in self._iterate_column_groupbys()),
keys=self.obj.columns,
axis=1)
df.groupby('A').describe()
works (?) but puts the descriptions in the index rather than in the columns:
In [29]: df.groupby('PRICE').describe() # expected .unstack(1)
Out[29]:
PRICE VOLUME
PRICE
24990 count 1 1.000000e+00
mean 24990 1.500000e+09
std NaN NaN
min 24990 1.500000e+09
25% 24990 1.500000e+09
50% 24990 1.500000e+09
75% 24990 1.500000e+09
max 24990 1.500000e+09
25499 count 2 2.000000e+00
mean 25499 2.550000e+09
std 0 3.464823e+09
min 25499 1.000000e+08
25% 25499 1.325000e+09
50% 25499 2.550000e+09
75% 25499 3.775000e+09
max 25499 5.000000e+09
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also create new ohlc method in DataFrameGroupby (I wasn't sure what was preferred)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm.....maybe i'll step thru this at some point....it is a bit confusing.....maybe something is off with ohlc.....I though describe would not work at all.....it might just need a parameter....becuase the behaviour IS to create a mi (e.g. it shouldn't need your patch)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I don't think my patch touches it. I refactored pr, a little nicer now...
I think ohlc behaviour is correct, confused about describe (above behaviour is in 0.12 too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps override describe (like I have ohlc) to do:
g._apply_to_column_groupbys(lambda x: x.describe().unstack(-1))
seems hacky, it is... must be nicer way
no what puzzles me is why ohlc fails and describe almost works |
@jreback What did you think about this one? Not sure what we were looking into re describe (is that a separate issue*?) * describe should have MultiIndex column, rather than index. |
can you put a test in for doing the same with describe and see what happens? |
When I did this last time and also in master:
so, it appends it to index, rather than as a MultiIndex column,... |
hmm...must be because the |
so merge? opened issue re describe |
ok |
closes #2320
Can use ohlc from DataFrame.
cc @jreback
Example: