nano-optimization for memchr::repeat_byte #50398

llogiq · 2018-05-02T22:00:37Z

This replaces the multiple shifts & bitwise or with a single multiplication

In my benchmarks this performs equally well or better, especially on 64bit systems (it shaves a stable nanosecond on my skylake). This may go against conventional wisdom, but the shifts and bitwise ors cannot be pipelined because of hard data dependencies.

While it may or may not be worthwile from an optimization standpoint, it also reduces code size, so there's basically no downside.

rust-highfive · 2018-05-02T22:00:40Z

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

llogiq · 2018-05-03T19:00:19Z

cc @Manishearth who wrote the original code

Manishearth · 2018-05-03T19:13:10Z

cc @BurntSushi who actually wrote this code (I just git mvd it)

and @jimblandy whose C code this was originally adapted from

BurntSushi · 2018-05-03T20:27:59Z

cc @bluss, who is the one who actually wrote the original fast memchr fallback implementation. :-)

This does look good to me though! Cute trick.

Manishearth · 2018-05-03T20:29:21Z

tldr nobody wrote it and it just appeared

nagisa · 2018-05-03T21:28:31Z

The improvement here (if any) is greatly dependent on relative instruction latency between bitwise ops and multiply. Most of the modern architectures have a fast multiply (throughput of 1+ insns per cycle) but latency is greater (3+ cycles). To contrast, bitwise instructions usually have a latency of 1 cycle (and througput of 2 to 4 insns per cycle), so as long as there are no more than 3 bitwise instructions in the critical path, the bitwise code will be almost universally faster (there seem to be 4 for 32-bit targets).

It might be considerably worse for 32-bit architectures which do not have a multiply instruction at all. I verified that we (or LLVM) do not support any 32+-bit targets which do not have a native instruction.

I observed some backends such as MIPS and Lanai to translate the multiply back into the original sequence of bitwise ops, presumably because it is more efficient to do so there. I observed x86 backend to do the same in some cases as well, but, notably, not ARM.

With this in mind, it seems pretty save to

@bors r+

bors · 2018-05-03T21:28:32Z

📌 Commit 1cefb5c has been approved by nagisa

nagisa · 2018-05-03T21:31:58Z

Ah, the pain of spotting a typo, but not wanting to confuse bors by editing the comment.

bors · 2018-05-04T05:38:25Z

⌛ Testing commit 1cefb5c with merge e78c51a...

nano-optimization for memchr::repeat_byte This replaces the multiple shifts & bitwise or with a single multiplication In my benchmarks this performs equally well or better, especially on 64bit systems (it shaves a stable nanosecond on my skylake). This may go against conventional wisdom, but the shifts and bitwise ors cannot be pipelined because of hard data dependencies. While it may or may not be worthwile from an optimization standpoint, it also reduces code size, so there's basically no downside.

bors · 2018-05-04T08:22:08Z

☀️ Test successful - status-appveyor, status-travis
Approved by: nagisa
Pushing e78c51a to master...

nano-optimization for memchr::repeat_byte

1cefb5c

rust-highfive assigned KodrAus May 2, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 2, 2018

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 3, 2018

bors merged commit 1cefb5c into rust-lang:master May 4, 2018

llogiq deleted the memchr-nano-opt branch May 4, 2018 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nano-optimization for memchr::repeat_byte #50398

nano-optimization for memchr::repeat_byte #50398

Uh oh!

llogiq commented May 2, 2018

Uh oh!

rust-highfive commented May 2, 2018

Uh oh!

llogiq commented May 3, 2018

Uh oh!

Manishearth commented May 3, 2018 •

edited

Loading

Uh oh!

BurntSushi commented May 3, 2018

Uh oh!

Manishearth commented May 3, 2018

Uh oh!

nagisa commented May 3, 2018

Uh oh!

bors commented May 3, 2018

Uh oh!

nagisa commented May 3, 2018

Uh oh!

bors commented May 4, 2018

Uh oh!

bors commented May 4, 2018

Uh oh!

Uh oh!

nano-optimization for memchr::repeat_byte #50398

nano-optimization for memchr::repeat_byte #50398

Uh oh!

Conversation

llogiq commented May 2, 2018

Uh oh!

rust-highfive commented May 2, 2018

Uh oh!

llogiq commented May 3, 2018

Uh oh!

Manishearth commented May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BurntSushi commented May 3, 2018

Uh oh!

Manishearth commented May 3, 2018

Uh oh!

nagisa commented May 3, 2018

Uh oh!

bors commented May 3, 2018

Uh oh!

nagisa commented May 3, 2018

Uh oh!

bors commented May 4, 2018

Uh oh!

bors commented May 4, 2018

Uh oh!

Uh oh!

Manishearth commented May 3, 2018 •

edited

Loading