-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Optimize initialization of arrays using repeat expressions #43488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is mainly for readability of the generated LLVM IR and subsequently assembly. There is a slight positive performance impact, likely due to I-cache effects.
This elides initialization for zero-sized arrays: * for zero-sized elements we previously emitted an empty loop * for arrays with a length of zero we previously emitted a loop with zero iterations This emits llvm.memset() instead of a loop over each element when: * all elements are zero integers * elements are byte sized
r? @arielb1 (rust_highfive has picked a reviewer for you, use r? to override) |
friendly ping @arielb1! i think you were on vacation but i think you're back now? checking on IRC too |
I'm back. Wait I thought this was WIP |
@arielb1 I'm not sure how to take that comment. What made you think this was WIP? |
I just had it mixed with another PR. Am reviewing your PR now. |
It would be nice if we had MIRI to deal with more complicated cases like @bors r+ |
📌 Commit ac43d58 has been approved by |
Actually, I think the second case could be implemented in a nicer way @bors r- |
src/librustc_trans/mir/rvalue.rs
Outdated
// Use llvm.memset.p0i8.* to initialize byte arrays | ||
let elem_layout = bcx.ccx.layout_of(tr_elem.ty).layout; | ||
match *elem_layout { | ||
Layout::Scalar { value: Primitive::Int(layout::I8), .. } | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "knows" that all scalars are immediates of LLVM type i8. I'm not sure that is always true, and it might break in the future. Can you move this to the previous if with a check that val_ty(v) == Type::i8(ccx)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have similar concerns for the CEnum
case?
Also I'd appreciate thoughts on how to avoid duplicating the call_memset()
setup code in the process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to merge them and just check the LLVM type. Also use from_immediate
to catch booleans too.
r+ with the if-cases merged. Nice optimization - we could make it more general with MIRI, but that's somewhat in the future. |
Updated to check the LLVM type. I somehow hadn't realized both cases would be |
src/librustc_trans/mir/rvalue.rs
Outdated
if common::val_ty(v) == Type::i8(bcx.ccx) { | ||
let align = align.unwrap_or_else(|| bcx.ccx.align_of(tr_elem.ty)); | ||
let align = C_i32(bcx.ccx, align as i32); | ||
let fill = tr_elem.immediate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use v
here rather than calling immediate
again.
Sorry for being a lazy reviewer. r=me with that nit resolved. |
Nit fixed. r? |
@bors r+ |
📌 Commit 6704450 has been approved by |
⌛ Testing commit 6704450 with merge 4b300779da21658990584e0d64514a23f0e71d3a... |
💔 Test failed - status-travis |
The new test case failed on
|
The type of the |
@bors r+ |
📌 Commit 3aa3a5c has been approved by |
⌛ Testing commit 3aa3a5cf983cf5dfcff23b1485279afd977c8562 with merge fd92186a766ca69fa102f921b1715b8e60b25ef7... |
💔 Test failed - status-travis |
…on the exact intrinsic used
So it turns out I clearly can't be trusted if I haven't slept properly. Apparently while double checking if I got all instances of |
@bors r+ |
📌 Commit 11d6312 has been approved by |
That happens to everyone. That's why we have bors. |
Optimize initialization of arrays using repeat expressions This PR was inspired by [this thread](https://www.reddit.com/r/rust/comments/6o8ok9/understanding_rust_performances_a_newbie_question/) on Reddit. It tries to bring array initialization in the same ballpark as `Vec::from_elem()` for unoptimized builds. For optimized builds this should relieve LLVM of having to figure out the construct we generate is in fact a `memset()`. To that end this emits `llvm.memset()` when: * the array is of integer type and all elements are zero (`Vec::from_elem()` also explicitly optimizes for this case) * the array elements are byte sized If the array is zero-sized initialization is omitted entirely.
☀️ Test successful - status-appveyor, status-travis |
Use llvm.memset.p0i8.* to initialize all same-bytes arrays It doesn't affect tests, LLVM seems smart enough for it, but then I wonder why we have the zero case at all (it was introduced in rust-lang#43488, maybe LLVM wasn't smart enough then). So let's run perf to see if there's any build time effect, and if no, I'll remove the zero special case and also run perf.
Use llvm.memset.p0i8.* to initialize all same-bytes arrays Similar to rust-lang#43488 debug builds can now handle `0x0101_u16` and other multi-byte scalars that have all the same bytes (instead of special casing just `0`)
This PR was inspired by this thread on Reddit.
It tries to bring array initialization in the same ballpark as
Vec::from_elem()
for unoptimized builds.For optimized builds this should relieve LLVM of having to figure out the construct we generate is in fact a
memset()
.To that end this emits
llvm.memset()
when:Vec::from_elem()
also explicitly optimizes for this case)If the array is zero-sized initialization is omitted entirely.