Skip to content

Optimize initialization of arrays using repeat expressions #43488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 6, 2017

Conversation

Florob
Copy link
Contributor

@Florob Florob commented Jul 26, 2017

This PR was inspired by this thread on Reddit.
It tries to bring array initialization in the same ballpark as Vec::from_elem() for unoptimized builds.
For optimized builds this should relieve LLVM of having to figure out the construct we generate is in fact a memset().

To that end this emits llvm.memset() when:

  • the array is of integer type and all elements are zero (Vec::from_elem() also explicitly optimizes for this case)
  • the array elements are byte sized

If the array is zero-sized initialization is omitted entirely.

Florob added 2 commits July 26, 2017 16:23
This is mainly for readability of the generated LLVM IR and subsequently
assembly. There is a slight positive performance impact, likely due to
I-cache effects.
This elides initialization for zero-sized arrays:
* for zero-sized elements we previously emitted an empty loop
* for arrays with a length of zero we previously emitted a loop with zero
  iterations

This emits llvm.memset() instead of a loop over each element when:
* all elements are zero integers
* elements are byte sized
@rust-highfive
Copy link
Contributor

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton alexcrichton added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 27, 2017
@carols10cents
Copy link
Member

friendly ping @arielb1! i think you were on vacation but i think you're back now? checking on IRC too

@arielb1
Copy link
Contributor

arielb1 commented Aug 1, 2017

I'm back. Wait I thought this was WIP

@Florob
Copy link
Contributor Author

Florob commented Aug 1, 2017

@arielb1 I'm not sure how to take that comment. What made you think this was WIP?
It bootstraps, passes the test suite, and adds a new passing codegen test.
It might still be complete bogus, because I've never really worked on rustc, but I guess that is for you to determine.

@arielb1
Copy link
Contributor

arielb1 commented Aug 1, 2017

I just had it mixed with another PR. Am reviewing your PR now.

@arielb1
Copy link
Contributor

arielb1 commented Aug 1, 2017

It would be nice if we had MIRI to deal with more complicated cases like None::<SomethingBig>, but I see no problem with this PR.

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 1, 2017

📌 Commit ac43d58 has been approved by arielb1

@arielb1
Copy link
Contributor

arielb1 commented Aug 1, 2017

Actually, I think the second case could be implemented in a nicer way

@bors r-

// Use llvm.memset.p0i8.* to initialize byte arrays
let elem_layout = bcx.ccx.layout_of(tr_elem.ty).layout;
match *elem_layout {
Layout::Scalar { value: Primitive::Int(layout::I8), .. } |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "knows" that all scalars are immediates of LLVM type i8. I'm not sure that is always true, and it might break in the future. Can you move this to the previous if with a check that val_ty(v) == Type::i8(ccx)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have similar concerns for the CEnum case?
Also I'd appreciate thoughts on how to avoid duplicating the call_memset() setup code in the process.

Copy link
Contributor

@arielb1 arielb1 Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to merge them and just check the LLVM type. Also use from_immediate to catch booleans too.

@arielb1
Copy link
Contributor

arielb1 commented Aug 1, 2017

r+ with the if-cases merged. Nice optimization - we could make it more general with MIRI, but that's somewhat in the future.

@Florob
Copy link
Contributor Author

Florob commented Aug 1, 2017

Updated to check the LLVM type. I somehow hadn't realized both cases would be i8, though it is obvious in retrospect.
r? @arielb1

if common::val_ty(v) == Type::i8(bcx.ccx) {
let align = align.unwrap_or_else(|| bcx.ccx.align_of(tr_elem.ty));
let align = C_i32(bcx.ccx, align as i32);
let fill = tr_elem.immediate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use v here rather than calling immediate again.

@arielb1
Copy link
Contributor

arielb1 commented Aug 3, 2017

Sorry for being a lazy reviewer. r=me with that nit resolved.

@Florob
Copy link
Contributor Author

Florob commented Aug 4, 2017

Nit fixed. r?

@arielb1
Copy link
Contributor

arielb1 commented Aug 4, 2017

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 4, 2017

📌 Commit 6704450 has been approved by arielb1

@bors
Copy link
Collaborator

bors commented Aug 4, 2017

⌛ Testing commit 6704450 with merge 4b300779da21658990584e0d64514a23f0e71d3a...

@bors
Copy link
Collaborator

bors commented Aug 4, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member

kennytm commented Aug 4, 2017

The new test case failed on dist-i586-gnu-i686-musl.

[00:55:24] failures:
[00:55:24] 
[00:55:24] ---- [codegen] codegen/slice-init.rs stdout ----
[00:55:24] 	
[00:55:24] error: verification with 'FileCheck' failed
[00:55:24] status: exit code: 1
[00:55:24] command: /checkout/obj/build/x86_64-unknown-linux-gnu/llvm/build/bin/FileCheck -input-file=/checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-init.ll /checkout/src/test/codegen/slice-init.rs
[00:55:24] stdout:
[00:55:24] ------------------------------------------
[00:55:24] 
[00:55:24] ------------------------------------------
[00:55:24] stderr:
[00:55:24] ------------------------------------------
[00:55:24] /checkout/src/test/codegen/slice-init.rs:36:12: error: expected string not found in input
[00:55:24]  // CHECK: call void @llvm.memset.p0i8.i{{[0-9]+}}(i8* {{.*}}, i8 7, i64 4
[00:55:24]            ^
[00:55:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-init.ll:81:24: note: scanning from here
[00:55:24] define void @byte_array() unnamed_addr #1 {
[00:55:24]                        ^
[00:55:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-init.ll:87:2: note: possible intended match here
[00:55:24]  call void @llvm.memset.p0i8.i32(i8* %1, i8 7, i32 4, i32 1, i1 false)
[00:55:24]  ^
[00:55:24] /checkout/src/test/codegen/slice-init.rs:52:12: error: expected string not found in input
[00:55:24]  // CHECK: call void @llvm.memset.p0i8.i{{[0-9]+}}(i8* {{.*}}, i8 {{.*}}, i64 4
[00:55:24]            ^
[00:55:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-init.ll:99:29: note: scanning from here
[00:55:24] define void @byte_enum_array() unnamed_addr #1 {
[00:55:24]                             ^
[00:55:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-init.ll:109:2: note: possible intended match here
[00:55:24]  call void @llvm.memset.p0i8.i32(i8* %2, i8 %1, i32 4, i32 1, i1 false)
[00:55:24]  ^
[00:55:24] /checkout/src/test/codegen/slice-init.rs:61:12: error: expected string not found in input
[00:55:24]  // CHECK: call void @llvm.memset.p0i8.i{{[0-9]+}}(i8* {{.*}}, i8 0, i64 16
[00:55:24]            ^
[00:55:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-init.ll:122:34: note: scanning from here
[00:55:24] define void @zeroed_integer_array() unnamed_addr #1 {
[00:55:24]                                  ^
[00:55:24] /checkout/obj/build/x86_64-unknown-linux-gnu/test/codegen/slice-init.ll:129:2: note: possible intended match here
[00:55:24]  call void @llvm.memset.p0i8.i32(i8* %2, i8 0, i32 16, i32 4, i1 false)
[00:55:24]  ^
[00:55:24] 
[00:55:24] ------------------------------------------
[00:55:24] 
[00:55:24] thread '[codegen] codegen/slice-init.rs' panicked at 'explicit panic', /checkout/src/tools/compiletest/src/runtest.rs:2499:8
[00:55:24] note: Run with `RUST_BACKTRACE=1` for a backtrace.
[00:55:24] 
[00:55:24] 
[00:55:24] failures:
[00:55:24]     [codegen] codegen/slice-init.rs
[00:55:24] 
[00:55:24] test result: FAILED. 43 passed; 1 failed; 3 ignored; 0 measured; 0 filtered out

@Florob
Copy link
Contributor Author

Florob commented Aug 4, 2017

The type of the len argument to llvm.memset.*() varies between architectures. The test case is now generic over this. Sorry for not paying close enough attention to this.
r? @arielb1

@arielb1
Copy link
Contributor

arielb1 commented Aug 4, 2017

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 4, 2017

📌 Commit 3aa3a5c has been approved by arielb1

@bors
Copy link
Collaborator

bors commented Aug 5, 2017

⌛ Testing commit 3aa3a5cf983cf5dfcff23b1485279afd977c8562 with merge fd92186a766ca69fa102f921b1715b8e60b25ef7...

@bors
Copy link
Collaborator

bors commented Aug 5, 2017

💔 Test failed - status-travis

@Florob
Copy link
Contributor Author

Florob commented Aug 5, 2017

So it turns out I clearly can't be trusted if I haven't slept properly. Apparently while double checking if I got all instances of llvm.memset(), I still missed one occurrence and completely messed up the commit message m(. Should be okay now. Definitely on x86_64, can't easily test for x86_32.
r? @arielb1 (and maybe pretend I'm an idiot while you're reviewing)

@arielb1
Copy link
Contributor

arielb1 commented Aug 6, 2017

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 6, 2017

📌 Commit 11d6312 has been approved by arielb1

@arielb1
Copy link
Contributor

arielb1 commented Aug 6, 2017

That happens to everyone. That's why we have bors.

@bors
Copy link
Collaborator

bors commented Aug 6, 2017

⌛ Testing commit 11d6312 with merge a9c24fd...

bors added a commit that referenced this pull request Aug 6, 2017
Optimize initialization of arrays using repeat expressions

This PR was inspired by [this thread](https://www.reddit.com/r/rust/comments/6o8ok9/understanding_rust_performances_a_newbie_question/) on Reddit.
It tries to bring array initialization in the same ballpark as `Vec::from_elem()` for unoptimized builds.
For optimized builds this should relieve LLVM of having to figure out the construct we generate is in fact a `memset()`.

To that end this emits `llvm.memset()` when:
* the array is of integer type and all elements are zero (`Vec::from_elem()` also explicitly optimizes for this case)
* the array elements are byte sized

If the array is zero-sized initialization is omitted entirely.
@bors
Copy link
Collaborator

bors commented Aug 6, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: arielb1
Pushing a9c24fd to master...

@bors bors merged commit 11d6312 into rust-lang:master Aug 6, 2017
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 8, 2025
Use llvm.memset.p0i8.* to initialize all same-bytes arrays

It doesn't affect tests, LLVM seems smart enough for it, but then I wonder why we have the zero case at all (it was introduced in rust-lang#43488, maybe LLVM wasn't smart enough then). So let's run perf to see if there's any build time effect, and if no, I'll remove the zero special case and also run perf.
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 11, 2025
Use llvm.memset.p0i8.* to initialize all same-bytes arrays

Similar to rust-lang#43488

debug builds can now handle `0x0101_u16` and other multi-byte scalars that have all the same bytes (instead of special casing just `0`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants