Skip to content

BUG: Kurtosis bug? #30993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mindryu opened this issue Jan 14, 2020 · 8 comments
Closed

BUG: Kurtosis bug? #30993

mindryu opened this issue Jan 14, 2020 · 8 comments
Labels
Needs Info Clarification about behavior needed to assess issue

Comments

@mindryu
Copy link

mindryu commented Jan 14, 2020

Why pandas kurtosis and scipy kurtosis result is different?
During the test, I also realized that the Pandas kurtosis value was constantly changing according to the starting point.

1578962134989

@mroeschke
Copy link
Member

  1. pandas statistical methods are unbiased, scipy.stats.kurtosis has bias=True by default
  2. It would be easier to describe the problem/difference comparing the numerical values instead visually with these graphs.

@mroeschke mroeschke added the Needs Info Clarification about behavior needed to assess issue label Jan 14, 2020
@mindryu
Copy link
Author

mindryu commented Jan 14, 2020

bias = False also different.
Apart from that, isn't it a bug that values constantly changing according to the starting point? Just looking at the screen shot, you can see the one line sticking out alone, even though I just sliding the window. If it's normal, it should all be the same shape, regardless of location.

@mindryu
Copy link
Author

mindryu commented Jan 14, 2020

It's weird. I'm trying to create an example by random generate time series. It can't be recreated. Several other stock charts work fine, too.
Is this what happens only some case or Did I do something wrong?

from scipy.stats import kurtosis
df = pd.read_csv('test.txt')
test_length=200

for d in range(10):
    test = df[-test_length-d:-d].reset_index(drop=True)
    plt.plot(test.close.rolling(4).kurt())
plt.show()


for d in range(10):
    test = df[-test_length-d:-d].reset_index(drop=True)
    plt.plot(test.close.rolling(4).apply(lambda x: kurtosis(x, bias=False), raw=True))
plt.show()

kurt
test.txt

@bashtage
Copy link
Contributor

Is the variance of your data very close to zero? If so, you could be seeing differences in rounding/numerical precision.

@bashtage
Copy link
Contributor

excess kurtosis should always be > -3. Since you are seeing large values then you are experiencing numerical errors and limits of numerical precision. I suspect many of your blocks have 0 or near 0 variance.

@bashtage
Copy link
Contributor

To repro, just share observations 20 - 30 of test.csv.

@mindryu
Copy link
Author

mindryu commented Jan 14, 2020

@bashtage

I think it's just like you said.
In fact, there's not much change in the 1 minute chart. It changes rapidly in a moment.
It wasn't stable enough to believe until rolling length 40. Until than, value have changed little by little

This problem made my three day trained network obsolete...
Meaning of sharing observations? Should I upload csv with the kurtosis value again?
Thank you.

1579011566007
Rolling 30

@mroeschke
Copy link
Member

We're happy to reopen this issue when we can validate the issue with a reproducible example. Looking at small raw data samples should be sufficient instead of plots

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

3 participants