Optimise .fill() throughput #43

Bluefinger · 2022-12-23T10:42:13Z

Looking at the fill method, I noticed there was some room for optimisation, and also removing unnecessary unwrap() from the hot loop. Also, the remainder portion was generating new entropy blocks for every u8 portion, when any remaining slice length would always be less than 8 bytes, or one u64 block (which is what WyRand generates per new state), Therefore, a maximum of seven u8 calls can be reduced to just one u64 block.

Afterwards, I simplified the copying of entropy to make use of copy_from_slice, since we know that either the slices will match (and need no try_into conversions), or that the length of the input will always be smaller than the target.

I then benchmarked the resulting code, on my AMD Ryzen 5 Pro 2500U (4C/8T 2,0Ghz base, 3.6Ghz max) with 16GB RAM.

Before:

running 2 tests
test fill             ... bench:          68 ns/iter (+/- 1)
test fill_naive       ... bench:         471 ns/iter (+/- 9)

After:

running 2 tests
test fill             ... bench:          64 ns/iter (+/- 1)
test fill_naive       ... bench:         485 ns/iter (+/- 15)

I was getting 68-70ns before, and now 64-65ns afterwards. Now, I added the #[inline] annotation out of curiosity and tested it, and was getting even more perf as a result:

After with #[inline]

running 2 tests
test fill             ... bench:          56 ns/iter (+/- 1)
test fill_naive       ... bench:         472 ns/iter (+/- 14)

So it might be advantageous to include it. So end result with all changes here is we've gone from 68-70ns to 64-65ns to then 56ns.

taiki-e

Thanks!

Bluefinger added 2 commits December 23, 2022 10:25

Optimize .fill() method throughput

e67b0e2

Add inline annotation back

5a89b18

taiki-e approved these changes Feb 1, 2023

View reviewed changes

taiki-e merged commit 08f7d3c into smol-rs:master Feb 1, 2023

notgull mentioned this pull request Feb 12, 2023

v1.9.0 #48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimise .fill() throughput #43

Optimise .fill() throughput #43

Uh oh!

Bluefinger commented Dec 23, 2022

Uh oh!

taiki-e left a comment

Uh oh!

Uh oh!

Optimise .fill() throughput #43

Optimise .fill() throughput #43

Uh oh!

Conversation

Bluefinger commented Dec 23, 2022

Uh oh!

taiki-e left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!