Skip to content

Conversation

@heiher
Copy link
Contributor

@heiher heiher commented Aug 25, 2025

LoongArch is a RISC instruction set architecture and currently a Tier-2 (with host-tools) target 1 in the Rust upstream community.

This patch introduces FP16 conversion functions based on the LoongArch SIMD extension to improve performance.

Benchmarks:

HalfFloatSliceExt::convert_from_f32_slice/constants
                        time:   [10.816 ns 10.823 ns 10.831 ns]
                        change: [-63.769% -63.728% -63.693%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_from_f32_slice/large
                        time:   [137.68 ns 137.77 ns 137.88 ns]
                        change: [-94.847% -94.841% -94.834%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_from_f64_slice/constants
                        time:   [12.656 ns 12.669 ns 12.684 ns]
                        change: [-78.455% -78.418% -78.367%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_from_f64_slice/large
                        time:   [544.15 ns 544.49 ns 544.91 ns]
                        change: [-89.799% -89.791% -89.781%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f32_slice/constants
                        time:   [6.0412 ns 6.0442 ns 6.0482 ns]
                        change: [-74.100% -74.068% -74.042%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f32_slice/large
                        time:   [512.78 ns 513.08 ns 513.45 ns]
                        change: [-77.628% -77.526% -77.422%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f64_slice/constants
                        time:   [10.779 ns 10.784 ns 10.792 ns]
                        change: [-49.028% -48.922% -48.813%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f64_slice/large
                        time:   [923.19 ns 923.77 ns 924.50 ns]
                        change: [-80.876% -80.862% -80.849%] (p = 0.00 < 0.05)
                        Performance has improved.

Footnotes

  1. https://doc.rust-lang.org/stable/rustc/platform-support/loongarch-linux.html

LoongArch is a RISC instruction set architecture and currently a Tier-2
(with host-tools) target [^1] in the Rust upstream community.

This patch introduces FP16 conversion functions based on the LoongArch
SIMD extension to improve performance.

Benchmarks:

```
HalfFloatSliceExt::convert_from_f32_slice/constants
                        time:   [10.816 ns 10.823 ns 10.831 ns]
                        change: [-63.769% -63.728% -63.693%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_from_f32_slice/large
                        time:   [137.68 ns 137.77 ns 137.88 ns]
                        change: [-94.847% -94.841% -94.834%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_from_f64_slice/constants
                        time:   [12.656 ns 12.669 ns 12.684 ns]
                        change: [-78.455% -78.418% -78.367%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_from_f64_slice/large
                        time:   [544.15 ns 544.49 ns 544.91 ns]
                        change: [-89.799% -89.791% -89.781%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f32_slice/constants
                        time:   [6.0412 ns 6.0442 ns 6.0482 ns]
                        change: [-74.100% -74.068% -74.042%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f32_slice/large
                        time:   [512.78 ns 513.08 ns 513.45 ns]
                        change: [-77.628% -77.526% -77.422%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f64_slice/constants
                        time:   [10.779 ns 10.784 ns 10.792 ns]
                        change: [-49.028% -48.922% -48.813%] (p = 0.00 < 0.05)
                        Performance has improved.

HalfFloatSliceExt::convert_to_f64_slice/large
                        time:   [923.19 ns 923.77 ns 924.50 ns]
                        change: [-80.876% -80.862% -80.849%] (p = 0.00 < 0.05)
                        Performance has improved.
```

[^1]: https://doc.rust-lang.org/stable/rustc/platform-support/loongarch-linux.html
@VoidStarKat VoidStarKat merged commit 12a3aef into VoidStarKat:main Sep 11, 2025
25 of 26 checks passed
Alexhuszagh referenced this pull request in Alexhuszagh/float16 Sep 21, 2025
LoongArch64 FP16 hardware support
@Amanieu
Copy link

Amanieu commented Oct 11, 2025

@heiher This causes the crate to fail to build on stable. The usage of nightly features should be gated behind a Cargo feature.

@heiher
Copy link
Contributor Author

heiher commented Oct 13, 2025

@heiher This causes the crate to fail to build on stable. The usage of nightly features should be gated behind a Cargo feature.

Thanks! A new nightly feature has been added to gate the usage of the nightly-only features: #136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants