Implement Series.plot.kde #767

HyukjinKwon · 2019-09-10T03:43:33Z

This PR takes over #710

KDE, unlike other plots like line or area, it needs to calculates values via Spark so that we can compute it in a distributed manner.

This PR uses MLlib's KernelDensity API to calculate KDE. Since Spark only support scalar bandwidth, unlike SciPy that pandas' uses, Koalas will currently only support fixed scalar bandwidth only.

Implementation is different so the values are slightly different but seems good enough:

import pandas as pd
pd.Series([1, 2, 2.5, 3, 3.5, 4, 5]).plot.kde(bw_method=0.3).figure.savefig("image.png")

import databricks.koalas as ks
ks.Series([1, 2, 2.5, 3, 3.5, 4, 5]).plot.kde(bw_method=0.3).figure.savefig("image.png")

import pandas as pd
pd.Series([1, 2, 2.5, 3, 3.5, 4, 5]).plot.kde(bw_method=3.0).figure.savefig("image.png")

import databricks.koalas as ks
ks.Series([1, 2, 2.5, 3, 3.5, 4, 5]).plot.kde(bw_method=3.0).figure.savefig("image.png")

codecov-io · 2019-09-10T04:09:29Z

Codecov Report

Merging #767 into master will decrease coverage by 0.06%.
The diff coverage is 87.27%.

@@            Coverage Diff             @@
##           master     #767      +/-   ##
==========================================
- Coverage    93.9%   93.83%   -0.07%     
==========================================
  Files          32       32              
  Lines        5691     5744      +53     
==========================================
+ Hits         5344     5390      +46     
- Misses        347      354       +7

Impacted Files	Coverage Δ
databricks/koalas/plot.py	`93.97% <87.27%> (-1.06%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29ab31d...12bf276. Read the comment docs.

softagram-bot · 2019-09-10T10:00:11Z

Softagram Impact Report for pull/767 (head commit: `12bf276`)

⭐ Change Overview

(Open in Softagram Desktop for full details)

📄 Full report

Permalink: Full report for pull/767

Impact Report explained. Give feedback on this report to [email protected]

HyukjinKwon · 2019-09-12T06:55:03Z

Let me merge this to move forward.

ueshin · 2019-09-25T21:33:37Z

databricks/koalas/tests/test_series_plot.py

+
+        pax = pdf['a'].plot('kde', bw_method=0.3)
+        kax = kdf['a'].plot('kde', bw_method=0.3)
+        self.compare_plots(pax, kax)


@HyukjinKwon Why does this still work even though the values are slightly different?

This PR implements kde in DataFrame. It reuses Series' implementation. at #767 like Series's kde plot, Since DataFrame's also uses MLlib's KernelDensity API to calculate KDE , so slightly different from pandas but seems good enough either: <img width="884" alt="스크린샷 2019-09-17 오후 2 24 34" src="https://user-images.githubusercontent.com/44108233/65013645-e3933100-d956-11e9-9166-be6d534046bd.png"> <img width="882" alt="스크린샷 2019-09-17 오후 2 25 18" src="https://user-images.githubusercontent.com/44108233/65013684-ff96d280-d956-11e9-8714-6c860c4e2c13.png"> And also kde is an alias of 'density', you can get exactly same result when you use 'density' rather than 'kde' like below: <img width="886" alt="스크린샷 2019-09-17 오후 2 27 30" src="https://user-images.githubusercontent.com/44108233/65013769-66b48700-d957-11e9-84f0-5b5989ed2d49.png"> <img width="882" alt="스크린샷 2019-09-17 오후 2 27 09" src="https://user-images.githubusercontent.com/44108233/65013770-67e5b400-d957-11e9-875c-0fd73b32fa5f.png"> **Multiple columns examples:** <img width="819" alt="스크린샷 2019-09-17 오후 2 55 35" src="https://user-images.githubusercontent.com/44108233/65015007-45ee3080-d95b-11e9-9c31-a4b85631e404.png"> and for each row as Series.plot.kde looks same like below: <img width="739" alt="스크린샷 2019-09-17 오후 2 57 57" src="https://user-images.githubusercontent.com/44108233/65015172-b9903d80-d95b-11e9-8245-8b47190c38b6.png">

HyukjinKwon mentioned this pull request Sep 10, 2019

Implement kde for Series #710

Closed

HyukjinKwon changed the title ~~Implement kde for Series~~ [WIP] Implement kde for Series Sep 10, 2019

itholic added 3 commits September 10, 2019 13:32

Implement kde for Series

c492ade

Reset unrelated changes

1ae8192

Reset unrelated changes exactly

b70e59e

HyukjinKwon force-pushed the impl_series_kde branch from 6c34c7a to b70e59e Compare September 10, 2019 04:32

HyukjinKwon changed the title ~~[WIP] Implement kde for Series~~ Implement Series.plot.kde Sep 10, 2019

Use Spark's Kernel Density Estimation

12bf276

HyukjinKwon force-pushed the impl_series_kde branch from ab5e7cf to 12bf276 Compare September 10, 2019 09:59

HyukjinKwon requested a review from dvgodoy September 10, 2019 10:11

HyukjinKwon merged commit a1125f9 into databricks:master Sep 12, 2019

itholic mentioned this pull request Sep 17, 2019

Add DataFrame.plot.kde() (an alias of 'density') #784

Merged

ueshin reviewed Sep 25, 2019

View reviewed changes

HyukjinKwon deleted the impl_series_kde branch November 6, 2019 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Series.plot.kde #767

Implement Series.plot.kde #767

Uh oh!

HyukjinKwon commented Sep 10, 2019 •

edited

Loading

Uh oh!

codecov-io commented Sep 10, 2019 •

edited

Loading

Uh oh!

softagram-bot commented Sep 10, 2019

Uh oh!

HyukjinKwon commented Sep 12, 2019

Uh oh!

ueshin Sep 25, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Implement Series.plot.kde #767

Implement Series.plot.kde #767

Uh oh!

Conversation

HyukjinKwon commented Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented Sep 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

softagram-bot commented Sep 10, 2019

Softagram Impact Report for pull/767 (head commit: 12bf276)

⭐ Change Overview

📄 Full report

Uh oh!

HyukjinKwon commented Sep 12, 2019

Uh oh!

ueshin Sep 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HyukjinKwon commented Sep 10, 2019 •

edited

Loading

codecov-io commented Sep 10, 2019 •

edited

Loading

Softagram Impact Report for pull/767 (head commit: `12bf276`)

ueshin Sep 25, 2019 •

edited

Loading