Skip to content

ENH: Add Series.histogram() #3945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danielballan opened this issue Jun 18, 2013 · 13 comments · Fixed by #4502
Closed

ENH: Add Series.histogram() #3945

danielballan opened this issue Jun 18, 2013 · 13 comments · Fixed by #4502

Comments

@danielballan
Copy link
Contributor

Should I write a Series.histogram that calls numpy.histogram and returns a new Series of counts indexed by bins?

In [4]: s = Series(randn(100))

In [5]: counts, bins = np.histogram(s)

In [6]: Series(counts, index=bins[:-1])
Out[6]: 
-2.968575     1
-2.355032     4
-1.741488     5
-1.127944    26
-0.514401    23
 0.099143    23
 0.712686    12
 1.326230     5
 1.939773     0
 2.553317     1
dtype: int32

...would be accomplished by...

Series.histogram()

which would accept at the arguments that np.histogram accepts (e.g., choosing bins manually, specifying number or range of bins, etc.).

from http://stackoverflow.com/a/17150734/1221924

The proposed method doesn't save all that much typing, but it would help users discover this convenient way of storing a histogram. Proceed?

@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

What you are proposing is basically this
(I am not sure what is a better result in any event)

In [9]: s.groupby(pd.cut(s,10)).count()
Out[9]: 
(-2.955, -2.419]     1
(-2.419, -1.889]     2
(-1.889, -1.359]     2
(-1.359, -0.829]     8
(-0.829, -0.299]    24
(-0.299, 0.231]     21
(0.231, 0.761]      15
(0.761, 1.291]      14
(1.291, 1.821]      10
(1.821, 2.351]       3
dtype: int64

@danielballan
Copy link
Contributor Author

That's new to me. Not sure which output is more valuable for whatever users would do next.

@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

yours more friendly for plotting
cut is NaN friendly (not sure if np.histogram is)

@jtratner
Copy link
Contributor

@jreback cut returns string row labels, right?

@jreback
Copy link
Contributor

jreback commented Jun 20, 2013

it returns a Categorical which has both labels and levels

@cpcloud
Copy link
Member

cpcloud commented Jul 20, 2013

would this be more useful as a top level function instead of a method?

np.histogram is not nan friendly

@jreback
Copy link
Contributor

jreback commented Jul 20, 2013

I think an instance method on series is about right (but maybe using groupy and cut as above)

@danielballan
Copy link
Contributor Author

I don't have strong feelings about this. It's certainly easy to work around, but I really hate working with Categoricals.

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

problem is np.histogram doesn't like nan
maybe just drop em anyhow

@cpcloud
Copy link
Member

cpcloud commented Jul 22, 2013

or u can just get the bins from cut

cat, bins = cut(s, 10, retbins=True)
s = Series(cat.values, index=bins[:-1], name=cat.name)

not sure if name should be there...

@hayd
Copy link
Contributor

hayd commented Aug 6, 2013

confusing having different hist and histogram methods... perhaps this could be an argument (bins) to value_counts ?

@danielballan
Copy link
Contributor Author

Good idea.
On Aug 6, 2013 6:52 PM, "Andy Hayden" [email protected] wrote:

confusing having different hist and histogram methods... perhaps this
could be an argument (bins) to value_counts ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/3945#issuecomment-22217991
.

@hayd
Copy link
Contributor

hayd commented Aug 6, 2013

This is pretty easy, pr on the way, will also add more functionality to Series.value_counts which was never updated from pd.value_counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants