Skip to content

Speed up arithmetic kernels, reduce unsafe usage #7493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 13, 2025

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented May 13, 2025

Which issue does this PR close?

Closes #7494

add(0)                  time:   [6.4821 µs 6.4910 µs 6.5021 µs]
                        change: [-19.934% -18.915% -17.753%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

add_checked(0)          time:   [6.4683 µs 6.4811 µs 6.4968 µs]
                        change: [-20.065% -19.129% -18.000%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

add_scalar(0)           time:   [3.7622 µs 3.8155 µs 3.8785 µs]
                        change: [-27.847% -25.770% -23.413%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

subtract(0)             time:   [6.4670 µs 6.4979 µs 6.5382 µs]
                        change: [-21.056% -20.048% -18.927%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

subtract_checked(0)     time:   [6.4643 µs 6.4894 µs 6.5301 µs]
                        change: [-20.731% -19.624% -18.022%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe

subtract_scalar(0)      time:   [3.7351 µs 3.7820 µs 3.8389 µs]
                        change: [-24.080% -21.711% -19.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

multiply(0)             time:   [6.6112 µs 6.7054 µs 6.8188 µs]
                        change: [-15.626% -14.183% -12.472%] (p = 0.00 < 0.05)
                        Performance has improved.

multiply_checked(0)     time:   [6.4615 µs 6.4715 µs 6.4835 µs]
                        change: [-20.333% -19.516% -18.569%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe

multiply_scalar(0)      time:   [3.8698 µs 3.9090 µs 3.9474 µs]
                        change: [-18.145% -15.365% -12.281%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

divide(0)               time:   [6.4552 µs 6.4607 µs 6.4674 µs]
                        change: [-21.795% -21.073% -20.470%] (p = 0.00 < 0.05)
                        Performance has improved.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels May 13, 2025
@github-actions github-actions bot removed the arrow-flight Changes to the arrow-flight crate label May 13, 2025
@alamb

This comment was marked as outdated.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice improvement to me. I started some other benchmark runs as well to gather some more data

Thank you @Dandandan

@alamb alamb changed the title Speed up arithmetic kernels Speed up arithmetic kernels, reduce unsafe usage May 13, 2025
@alamb

This comment was marked as outdated.

@Dandandan
Copy link
Contributor Author

🤖: Benchmark completed

Details

I don't think those kernels use this code path 🤔

@alamb
Copy link
Contributor

alamb commented May 13, 2025

Sorry @Dandandan -- I ran the wrong script -- fixing

@alamb
Copy link
Contributor

alamb commented May 13, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing arithmetic_speed (df4c555) to 6a3ecef diff
BENCH_NAME=arithmetic_kernels
BENCH_COMMAND=cargo bench --all-features --bench arithmetic_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=arithmetic_speed
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented May 13, 2025

🤖: Benchmark completed

Details

group                    arithmetic_speed                       main
-----                    ----------------                       ----
add(0)                   1.00     14.3±0.08µs        ? ?/sec    1.05     15.0±0.11µs        ? ?/sec
add(0.1)                 1.00     12.3±0.21µs        ? ?/sec    1.03     12.7±0.17µs        ? ?/sec
add(0.5)                 1.00     12.4±0.12µs        ? ?/sec    1.03     12.7±0.14µs        ? ?/sec
add(0.9)                 1.00     12.5±0.16µs        ? ?/sec    1.17     14.6±0.13µs        ? ?/sec
add(1)                   1.00     12.5±0.07µs        ? ?/sec    1.02     12.7±0.13µs        ? ?/sec
add_checked(0)           1.00     11.1±0.07µs        ? ?/sec    1.03     11.3±0.18µs        ? ?/sec
add_checked(0.1)         1.00     12.4±0.08µs        ? ?/sec    1.02     12.6±0.07µs        ? ?/sec
add_checked(0.5)         1.00     12.5±0.17µs        ? ?/sec    1.01     12.6±0.16µs        ? ?/sec
add_checked(0.9)         1.00     12.4±0.16µs        ? ?/sec    1.17     14.5±0.28µs        ? ?/sec
add_checked(1)           1.00     12.3±0.13µs        ? ?/sec    1.03     12.7±0.10µs        ? ?/sec
add_scalar(0)            1.00      7.1±0.04µs        ? ?/sec    1.00      7.1±0.02µs        ? ?/sec
add_scalar(0.1)          1.00      7.0±0.02µs        ? ?/sec    1.01      7.1±0.02µs        ? ?/sec
add_scalar(0.5)          1.00      7.1±0.09µs        ? ?/sec    1.00      7.1±0.02µs        ? ?/sec
add_scalar(0.9)          1.00      7.0±0.02µs        ? ?/sec    1.00      7.1±0.02µs        ? ?/sec
add_scalar(1)            1.00      7.0±0.01µs        ? ?/sec    1.01      7.1±0.02µs        ? ?/sec
divide(0)                1.00     13.4±0.03µs        ? ?/sec    1.01     13.6±0.12µs        ? ?/sec
divide(0.1)              1.00     14.7±0.06µs        ? ?/sec    1.01     14.9±0.08µs        ? ?/sec
divide(0.5)              1.00     14.8±0.05µs        ? ?/sec    1.00     14.8±0.06µs        ? ?/sec
divide(0.9)              1.00     14.8±0.06µs        ? ?/sec    1.00     14.7±0.10µs        ? ?/sec
divide(1)                1.00     14.8±0.07µs        ? ?/sec    1.00     14.8±0.07µs        ? ?/sec
divide_scalar(0)         1.00     13.3±0.03µs        ? ?/sec    1.00     13.4±0.02µs        ? ?/sec
divide_scalar(0.1)       1.00     13.3±0.03µs        ? ?/sec    1.00     13.4±0.02µs        ? ?/sec
divide_scalar(0.5)       1.00     13.3±0.03µs        ? ?/sec    1.00     13.4±0.03µs        ? ?/sec
divide_scalar(0.9)       1.00     13.3±0.03µs        ? ?/sec    1.00     13.4±0.05µs        ? ?/sec
divide_scalar(1)         1.00     13.3±0.02µs        ? ?/sec    1.00     13.4±0.03µs        ? ?/sec
modulo(0)                1.05    349.5±1.11µs        ? ?/sec    1.00    333.4±0.81µs        ? ?/sec
modulo(0.1)              1.01    388.2±0.97µs        ? ?/sec    1.00    384.4±0.89µs        ? ?/sec
modulo(0.5)              1.00    532.8±1.55µs        ? ?/sec    1.06    563.7±0.99µs        ? ?/sec
modulo(0.9)              1.00    293.5±0.47µs        ? ?/sec    1.08    317.3±2.08µs        ? ?/sec
modulo(1)                1.03    244.6±1.42µs        ? ?/sec    1.00    238.1±0.70µs        ? ?/sec
modulo_scalar(0)         1.00    503.1±1.38µs        ? ?/sec    1.02    511.1±1.58µs        ? ?/sec
modulo_scalar(0.1)       1.00    480.1±1.20µs        ? ?/sec    1.01    483.9±2.18µs        ? ?/sec
modulo_scalar(0.5)       1.00    324.4±0.67µs        ? ?/sec    1.03    333.2±0.55µs        ? ?/sec
modulo_scalar(0.9)       1.00    162.2±0.46µs        ? ?/sec    1.13    183.5±0.25µs        ? ?/sec
modulo_scalar(1)         1.00    122.4±0.20µs        ? ?/sec    1.14    140.0±0.20µs        ? ?/sec
multiply(0)              1.00     11.1±0.08µs        ? ?/sec    1.03     11.4±0.09µs        ? ?/sec
multiply(0.1)            1.00     12.3±0.14µs        ? ?/sec    1.04     12.7±0.16µs        ? ?/sec
multiply(0.5)            1.00     12.4±0.11µs        ? ?/sec    1.03     12.8±0.13µs        ? ?/sec
multiply(0.9)            1.00     12.5±0.10µs        ? ?/sec    1.16     14.6±0.15µs        ? ?/sec
multiply(1)              1.00     12.5±0.10µs        ? ?/sec    1.01     12.6±0.10µs        ? ?/sec
multiply_checked(0)      1.00     11.1±0.05µs        ? ?/sec    1.03     11.5±0.11µs        ? ?/sec
multiply_checked(0.1)    1.00     12.4±0.11µs        ? ?/sec    1.02     12.7±0.10µs        ? ?/sec
multiply_checked(0.5)    1.00     12.6±0.07µs        ? ?/sec    1.01     12.7±0.15µs        ? ?/sec
multiply_checked(0.9)    1.00     12.5±0.14µs        ? ?/sec    1.17     14.5±0.15µs        ? ?/sec
multiply_checked(1)      1.00     12.4±0.12µs        ? ?/sec    1.02     12.6±0.05µs        ? ?/sec
multiply_scalar(0)       1.00      7.0±0.02µs        ? ?/sec    1.01      7.1±0.02µs        ? ?/sec
multiply_scalar(0.1)     1.00      7.0±0.07µs        ? ?/sec    1.00      7.1±0.03µs        ? ?/sec
multiply_scalar(0.5)     1.00      7.0±0.02µs        ? ?/sec    1.01      7.1±0.02µs        ? ?/sec
multiply_scalar(0.9)     1.00      7.0±0.03µs        ? ?/sec    1.01      7.1±0.02µs        ? ?/sec
multiply_scalar(1)       1.00      7.0±0.06µs        ? ?/sec    1.01      7.1±0.02µs        ? ?/sec
subtract(0)              1.00     11.0±0.06µs        ? ?/sec    1.01     11.1±0.32µs        ? ?/sec
subtract(0.1)            1.00     12.4±0.15µs        ? ?/sec    1.03     12.8±0.15µs        ? ?/sec
subtract(0.5)            1.00     12.5±0.09µs        ? ?/sec    1.02     12.7±0.14µs        ? ?/sec
subtract(0.9)            1.00     12.6±0.08µs        ? ?/sec    1.13     14.2±0.68µs        ? ?/sec
subtract(1)              1.00     12.4±0.14µs        ? ?/sec    1.01     12.5±0.28µs        ? ?/sec
subtract_checked(0)      1.00     11.0±0.06µs        ? ?/sec    1.04     11.4±0.10µs        ? ?/sec
subtract_checked(0.1)    1.00     12.3±0.14µs        ? ?/sec    1.04     12.8±0.13µs        ? ?/sec
subtract_checked(0.5)    1.00     12.5±0.10µs        ? ?/sec    1.02     12.7±0.27µs        ? ?/sec
subtract_checked(0.9)    1.00     12.6±0.15µs        ? ?/sec    1.10     13.8±0.82µs        ? ?/sec
subtract_checked(1)      1.00     12.4±0.15µs        ? ?/sec    1.01     12.5±0.22µs        ? ?/sec
subtract_scalar(0)       1.00      7.0±0.02µs        ? ?/sec    1.01      7.1±0.02µs        ? ?/sec
subtract_scalar(0.1)     1.00      7.1±0.09µs        ? ?/sec    1.00      7.1±0.05µs        ? ?/sec
subtract_scalar(0.5)     1.00      7.0±0.05µs        ? ?/sec    1.01      7.1±0.03µs        ? ?/sec
subtract_scalar(0.9)     1.00      7.0±0.01µs        ? ?/sec    1.01      7.1±0.03µs        ? ?/sec
subtract_scalar(1)       1.00      7.0±0.14µs        ? ?/sec    1.02      7.1±0.03µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented May 13, 2025

🤖: Benchmark completed

Looks like an across the board win to me ❤️

@Dandandan
Copy link
Contributor Author

🤖: Benchmark completed

Looks like an across the board win to me ❤️

Yeah somehow this also is a larger win on my machine, but every win is a win!

@Dandandan Dandandan merged commit 8dbca1e into apache:main May 13, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arithmetic kernels can be safer and faster
2 participants