Skip to content

Improve performance of median function #13550

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

The corr function is used in the h20 benchmark and is quite slow.

https://github.com/apache/datafusion/blob/main/datafusion/functions-aggregate/src/median.rs

It would be great to make it faster so our performance in the H2o benchmark improves

Describe the solution you'd like

See details on #13548

  1. Add a benchmark for median function
  2. Improve performance of median function (likely by implementing GroupsAccumulator)

Describe alternatives you've considered

No response

Additional context

Used here in @MrPower's benchmark:

https://github.com/MrPowers/mrpowers-benchmarks/blob/0b586a0657d7f6cfd55d89508e15b95e79bd4010/benchmarks/datafusion_h2o_groupby_queries.py#L25

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions