Contrib guide update #246

rxin · 2019-05-06T22:35:25Z

No description provided.

rxin · 2019-05-06T22:35:50Z

This is mostly a clean up and doesn't change the gist of it, since it was already pretty well written. @thunterdb can you take a look? Thanks.

codecov-io · 2019-05-06T22:44:25Z

Codecov Report

Merging #246 into master will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #246      +/-   ##
==========================================
+ Coverage   91.59%   91.62%   +0.03%     
==========================================
  Files          35       35              
  Lines        3022     3022              
==========================================
+ Hits         2768     2769       +1     
+ Misses        254      253       -1

Impacted Files	Coverage Δ
databricks/koalas/__init__.py	`92.59% <0%> (+3.7%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19a1536...29de231. Read the comment docs.

thunterdb

@rxin thanks, just a few comments. Feel free to correct the typos and merge. I think that this should be published into the guide but that be done at a later step.

thunterdb · 2019-05-07T17:28:28Z

CONTRIBUTING.md

@@ -1,112 +1,81 @@
-# Contributing to Koalas - design and principles


Should this go into the docs and get published? It contains a lot of useful info for users too.

yea let's work on that next. maybe we can even put everything in readme.

thunterdb · 2019-05-07T17:28:56Z

CONTRIBUTING.md

 In particular, it answers the questions:
- - what is in the scope of the Koalas project? What should go into PySpark or Pandas instead?
- - What is expected for code contributions
+ - What is in the scope of the Koalas project? What should go into PySpark or Pandas instead?


Pandas -> pandas :)

thunterdb · 2019-05-07T17:57:30Z

CONTRIBUTING.md

-Functions that present algorithms specific to distributed datasets
-(approx quantiles for example)
+    The workaround is to force the materialization of the pandas DataFrame, either by calling:
+      - `.to_pandas()` (koalas only)


: returns a pandas DataFrame, koalas only

thunterdb · 2019-05-07T17:58:08Z

CONTRIBUTING.md

-(approx quantiles for example)
+    The workaround is to force the materialization of the pandas DataFrame, either by calling:
+      - `.to_pandas()` (koalas only)
+      - `.to_numpy()` (works with both pandas and koalas)


returns a numpy array, works with both pandas and Koalas. + put a link to the docs of each method.

thunterdb · 2019-05-07T17:58:51Z

CONTRIBUTING.md

+    - DataFrame.values
+    - `DataFrame.__iter__` and the array protocol `__array__`

+3. *Low-level multidimensional arrays*: Other frameworks like Dask or Molin have a low-level block representation of a multidimensional array that Spark lacks. Until such representation is available, these functions should not be considered.


Low-level functions for multidimensional arrays:

This includes for example all the array representations in pandas.array

thunterdb · 2019-05-07T18:04:47Z

CONTRIBUTING.md

-TODO: Koalas methods for reading and writing should work for both local and distributed files.
+### Spark functions that should be included in Koalas

+- pyspark.range


Since they block the access to data fields, I am thinking that all the functions specific to spark (like physical layout, caching, etc.), they should be put under a spark accessor: k_df.spark.cache() returns a koala df that has been cached. What do you think?

we probably need to try a few and see ...

Contrib guide update

b084a5e

rxin requested a review from thunterdb May 6, 2019 22:35

add signaling work

29de231

thunterdb approved these changes May 7, 2019

View reviewed changes

cr

cae2dfb

rxin merged commit a3e5160 into databricks:master May 7, 2019

		@@ -1,112 +1,81 @@
		# Contributing to Koalas - design and principles

Contrib guide update #246

Contrib guide update #246

Uh oh!

Conversation

rxin commented May 6, 2019

Uh oh!

rxin commented May 6, 2019

Uh oh!

codecov-io commented May 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

thunterdb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-io commented May 6, 2019 •

edited

Loading