Commit 345117b
Support vectorized append and compare for multi group by (apache#12996)
* simple support vectorized append.
* fix tests.
* some logs.
* add `append_n` in `MaybeNullBufferBuilder`.
* impl basic append_batch
* fix equal to.
* define `GroupIndexContext`.
* define the structs useful in vectorizing.
* re-define some structs for vectorized operations.
* impl some vectorized logics.
* impl chekcing hashmap stage.
* fix compile.
* tmp
* define and impl `vectorized_compare`.
* fix compile.
* impl `vectorized_equal_to`.
* impl `vectorized_append`.
* finish the basic vectorized ops logic.
* impl `take_n`.
* fix `renaming clear` and `groups fill`.
* fix death loop due to rehashing.
* fix vectorized append.
* add counter.
* use extend rather than resize.
* remove dbg!.
* remove reserve.
* refactor the codes to make simpler and more performant.
* clear `scalarized_indices` in `intern` to avoid some corner case.
* fix `scalarized_equal_to`.
* fallback to total scalarized `GroupValuesColumn` in streaming aggregation.
* add unit test for `VectorizedGroupValuesColumn`.
* add unit test for emitting first n in `VectorizedGroupValuesColumn`.
* sort out tests codes in for group columns and add vectorized tests for primitives.
* add vectorized test for byte builder.
* add vectorized test for byte view builder.
* add test for the all nulls or not nulls branches in vectorized.
* fix clippy.
* fix fmt.
* fix compile in rust 1.79.
* improve comments.
* fix doc.
* add more comments to explain the really complex vectorized intern process.
* add comments to explain why we still need origin `GroupValuesColumn`.
* remove some stale comments.
* fix clippy.
* add comments for `vectorized_equal_to` and `vectorized_append`.
* fix clippy.
* use zip to simplify codes.
* use izip to simplify codes.
* Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs
Co-authored-by: Jay Zhan <[email protected]>
* first_n attempt
Signed-off-by: jayzhan211 <[email protected]>
* add test
Signed-off-by: jayzhan211 <[email protected]>
* improve hashtable modifying in emit first n test.
* add `emit_group_index_list_buffer` to avoid allocating new `Vec` to store the remaining gourp indices.
* make comments in VectorizedGroupValuesColumn::intern simpler and clearer.
* define `VectorizedOperationBuffers` to hold buffers used in vectorized operations to make code clearer.
* unify `VectorizedGroupValuesColumn` and `GroupValuesColumn`.
* fix fmt.
* fix comments.
* fix clippy.
---------
Signed-off-by: jayzhan211 <[email protected]>
Co-authored-by: Jay Zhan <[email protected]>1 parent c3a9847 commit 345117b
File tree
9 files changed
+2296
-227
lines changed- datafusion
- common/src/utils
- core/tests/user_defined
- physical-plan/src/aggregates
- group_values
- proto/tests/cases
- substrait/tests/cases
9 files changed
+2296
-227
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
105 | | - | |
| 105 | + | |
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
| |||
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
0 commit comments