Skip to content

Conversation

@ueshin
Copy link
Collaborator

@ueshin ueshin commented Dec 19, 2020

Refines DataFrame/Series._reduce_for_stat_function to avoid special handling based on a specific function.

Also:

  • Consolidates the implementations of count and support numeric_only parameter.
  • Adds argument type annotations.


return self._reduce_for_stat_function(var, name="var", axis=axis, numeric_only=numeric_only)

def median(
Copy link
Collaborator Author

@ueshin ueshin Dec 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved here for ease of type annotation; bool is overwritten later..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is bool overwritten?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def bool(self) -> bool:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

@codecov-io
Copy link

codecov-io commented Dec 19, 2020

Codecov Report

Merging #1975 (5b1205b) into master (b81afcc) will decrease coverage by 0.04%.
The diff coverage is 90.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1975      +/-   ##
==========================================
- Coverage   94.60%   94.56%   -0.05%     
==========================================
  Files          50       50              
  Lines       10905    10935      +30     
==========================================
+ Hits        10317    10341      +24     
- Misses        588      594       +6     
Impacted Files Coverage Δ
databricks/koalas/series.py 96.90% <ø> (-0.02%) ⬇️
databricks/koalas/generic.py 92.13% <90.32%> (-0.79%) ⬇️
databricks/koalas/frame.py 96.78% <100.00%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b81afcc...5b1205b. Read the comment docs.


def max(self, axis=None, numeric_only=None) -> Union[Scalar, "Series"]:
def max(
self, axis: Union[int, str] = None, numeric_only: bool = None
Copy link
Contributor

@xinrong-meng xinrong-meng Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use numeric_only: bool = None rather than numeric_only: bool = False for pandas compatibility?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. cc @itholic who changed it to None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's for pandas compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

"""

def count(spark_column, spark_type):
# Special handle floating point types because Spark's count treats nan as a valid value,
Copy link
Contributor

@xinrong-meng xinrong-meng Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring is much cleaner.

@xinrong-meng
Copy link
Contributor

Looks great to me! Left some questions.

@ueshin
Copy link
Collaborator Author

ueshin commented Dec 21, 2020

Thanks! I'd merge this now.

@ueshin ueshin merged commit c973195 into databricks:master Dec 21, 2020
@ueshin ueshin deleted the refine_reduce_for_stat_function branch December 21, 2020 19:40
@itholic
Copy link
Contributor

itholic commented Dec 23, 2020

Great. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants