Skip to content

Optimize coalesce kernel for StringViewArray (5-10%) #7620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jun 6, 2025

Which issue does this PR close?

Rationale for this change

@Dandandan and @zhuqi-lucas pointed out some places we could improve gc_string_view on #7597, so let's do that

What changes are included in this PR?

Avoid allocations and re-validating RecordBatch schema

Are there any user-facing changes?

Faster performance (these are the larger string benchmarks added in #7619):

filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001
                        time:   [34.474 ms 34.571 ms 34.678 ms]
                        change: [−8.7777% −8.1016% −7.4194%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01
                        time:   [4.5034 ms 4.5153 ms 4.5282 ms]
                        change: [−7.4270% −6.8866% −6.3152%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  6 (6.00%) low severe
  2 (2.00%) low mild
  4 (4.00%) high mild
  11 (11.00%) high severe

filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.1
                        time:   [2.1658 ms 2.1953 ms 2.2236 ms]
                        change: [−5.6248% −4.0141% −2.3053%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.6s, enable flat sampling, or reduce sample count to 50.
filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.8
                        time:   [1.6407 ms 1.6450 ms 1.6491 ms]
                        change: [−12.806% −11.526% −10.507%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 30 outliers among 100 measurements (30.00%)
  1 (1.00%) low severe
  21 (21.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jun 6, 2025
@alamb
Copy link
Contributor Author

alamb commented Jun 6, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/optimize_gc_string_view (a9ac1bc) to 026356b diff
BENCH_NAME=coalesce_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_optimize_gc_string_view
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Jun 6, 2025

🤖: Benchmark completed

Details

group                                                            alamb_optimize_gc_string_view          main
-----                                                            -----------------------------          ----
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001           1.15    319.8±3.67ms        ? ?/sec    1.00    278.1±2.07ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01            1.00      8.7±0.07ms        ? ?/sec    1.03      9.0±0.09ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1             1.01      4.4±0.08ms        ? ?/sec    1.00      4.4±0.13ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8             1.00      3.5±0.02ms        ? ?/sec    1.01      3.5±0.02ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001         1.00    272.0±2.07ms        ? ?/sec    1.01    273.8±2.91ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01          1.00      9.9±0.10ms        ? ?/sec    1.07     10.6±0.06ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1           1.00      4.7±0.12ms        ? ?/sec    1.06      4.9±0.09ms        ? ?/sec
filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8           1.00      4.5±0.02ms        ? ?/sec    1.02      4.6±0.02ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001           1.00     64.5±0.52ms        ? ?/sec    1.06     68.4±0.54ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01            1.00     12.4±0.14ms        ? ?/sec    1.02     12.7±0.17ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1             1.00     10.1±0.35ms        ? ?/sec    1.02     10.3±0.38ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8             1.00      8.3±0.17ms        ? ?/sec    1.00      8.2±0.21ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001         1.00     81.6±0.53ms        ? ?/sec    1.08     88.3±0.40ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01          1.00     14.1±0.11ms        ? ?/sec    1.04     14.7±0.14ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1           1.00     10.0±0.36ms        ? ?/sec    1.06     10.6±0.30ms        ? ?/sec
filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8           1.00      9.8±0.17ms        ? ?/sec    1.02      9.9±0.19ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0, selectivity: 0.001       1.00     75.6±2.01ms        ? ?/sec    1.03     77.7±0.47ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0, selectivity: 0.01        1.00      8.8±0.06ms        ? ?/sec    1.04      9.2±0.02ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0, selectivity: 0.1         1.02      4.4±0.18ms        ? ?/sec    1.00      4.3±0.12ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0, selectivity: 0.8         1.00      4.7±0.02ms        ? ?/sec    1.01      4.8±0.02ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0.1, selectivity: 0.001     1.00     92.0±0.39ms        ? ?/sec    1.02     94.2±0.39ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0.1, selectivity: 0.01      1.00     12.8±0.05ms        ? ?/sec    1.08     13.8±0.05ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0.1, selectivity: 0.1       1.00      5.7±0.16ms        ? ?/sec    1.09      6.2±0.17ms        ? ?/sec
filter: mixed_utf8view, 8192, nulls: 0.1, selectivity: 0.8       1.01      7.8±0.02ms        ? ?/sec    1.00      7.7±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.001      1.00    120.4±0.79ms        ? ?/sec    1.07    129.1±0.48ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.01       1.00     15.0±0.08ms        ? ?/sec    1.03     15.4±0.05ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.1        1.01      7.9±0.09ms        ? ?/sec    1.00      7.8±0.13ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0, selectivity: 0.8        1.01      9.4±0.03ms        ? ?/sec    1.00      9.4±0.02ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.001    1.06    153.3±0.66ms        ? ?/sec    1.00    145.2±0.41ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.01     1.00     21.4±0.07ms        ? ?/sec    1.01     21.6±0.07ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.1      1.06     11.2±0.14ms        ? ?/sec    1.00     10.6±0.07ms        ? ?/sec
filter: single_utf8view, 8192, nulls: 0.1, selectivity: 0.8      1.03     13.7±0.03ms        ? ?/sec    1.00     13.3±0.04ms        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Jun 6, 2025

I did't realize @Dandandan has done basically the same thing in #7614, so let's go with that one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant