Skip to content

Commit 70c95b2

Browse files
committed
Auto merge of rust-lang#130679 - saethlin:inline-usually, r=<try>
Add inline(usually) I'm looking into what kind of things could recover the perf improvement detected in rust-lang#121417 (comment). I think it's worth spending quite a bit of effort to figure out how to capture a 45% incr-patched improvement. As far as I can tell, the root cause of the problem is that we have taken very deliberate steps in the compiler to ensure that `#[inline(always)]` causes inlining where possible, even when all optimizations are disabled. Some of the reasons that was done are now outdated or were misguided. I think the only remaining use case is where the inlined body even without optimizations is cheaper to codegen or call, for example SIMD intrinsics may require a lot of code to put their arguments on the stack, which is slow to compile and run. I'm quite sure that the majority of users applied this attribute believing it does not cause inlining in unoptimized builds, or didn't appreciate the build time regressions that implies and would prefer it didn't if they knew. (if that's you, put a heart on this or say something elsewhere, don't reply on this PR) I am going to _try_ to use the existing benchmark suite to evaluate a number of different approaches and take notes here, and hopefully I can collect enough data to shape any conversation about what we can do to help users. The core of this PR is `InlineAttr::Usually` (name doesn't matter) which ensures that when optimizations are enabled that the function is inlined (usual exceptions like recursion apply). I think most users believe this is what `#[inline(always)]` does. rust-lang#130685 (comment) Replaced `#[inline(always)]` with `#[inline(usually)]` in the standard library, and did not recover the same 45% incr-patched improvement in regex. It's a tidy net positive though, and I suspect that perf improvement would normally be big enough to motivate merging a change. I think that means the standard library's use of `#[inline(always)]` is imposing marginal compile time overhead on the ecosystem, but the bigger opportunities are probably in third-party crates. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` literally everywhere; this gets the desired incr-patched improvement but suffers quite a few check and doc regressions. I think that means that `alwaysinline` is more powerful than `function-inline-cost=0` in LLVM. rust-lang#130679 (comment) Treats `#[inline(always)]` as `#[inline(usually)]` when `-Copt-level=0`, which looks basically the same as rust-lang#121417 (comment) (omit `alwaysinline` when doing `-Copt-level=0` codegen). rust-lang#130679 (comment) replaces `alwaysinline` with a very negative inline cost, and it still has check and doc regressions. More investigation required on what the different inlining decision is. rust-lang#130679 (comment) is a likely explanation of this, with some interesting implications; adding `inline(always)` to a function that was going to be inlined anyway can change change optimizations (usually it seems to improve things?). rust-lang#130679 (comment) makes `#[inline(usually)]` also defy instantiation mode selection and always be LocalCopy the way `#[inline(always)]` does, but still has regressions in stm32f4. I think that proves that `alwaysinline` can actually improve debug build times. rust-lang#130679 (comment) infers `alwaysinline` for extremely trivial functions, but still has regressions for stm32f4. But of course it does, I left `inline(always)` treated as `inline(usually)` which slows down the compiler 🤦 inconclusive perf run. rust-lang#130679 (comment) doesn't have any stm32f4 regressions 🥳 I think this means that there is some threshold where `alwaysinline` produces faster debug builds. So still two questions: 1. Why does `alwaysinline` sometimes make debug builds faster? 2. Is there any obvious threshold at which adding `alwaysinline` causes more work for debug builds?
2 parents 2da3cb9 + 8ca3275 commit 70c95b2

File tree

7 files changed

+52
-5
lines changed

7 files changed

+52
-5
lines changed

compiler/rustc_attr/src/builtin.rs

+1
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ pub enum InlineAttr {
4646
Hint,
4747
Always,
4848
Never,
49+
Usually,
4950
}
5051

5152
#[derive(Clone, Encodable, Decodable, Debug, PartialEq, Eq, HashStable_Generic)]

compiler/rustc_codegen_gcc/src/attributes.rs

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ fn inline_attr<'gcc, 'tcx>(
3030
None
3131
}
3232
}
33+
InlineAttr::Usually => Some(FnAttribute::Inline),
3334
InlineAttr::None => None,
3435
}
3536
}

compiler/rustc_codegen_llvm/src/attributes.rs

+45-3
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,24 @@ pub(crate) fn apply_to_callsite(callsite: &Value, idx: AttributePlace, attrs: &[
3131

3232
/// Get LLVM attribute for the provided inline heuristic.
3333
#[inline]
34-
fn inline_attr<'ll>(cx: &CodegenCx<'ll, '_>, inline: InlineAttr) -> Option<&'ll Attribute> {
34+
fn inline_attr<'ll>(
35+
cx: &CodegenCx<'ll, '_>,
36+
inline: InlineAttr,
37+
should_always_inline: bool,
38+
) -> Option<&'ll Attribute> {
3539
if !cx.tcx.sess.opts.unstable_opts.inline_llvm {
3640
// disable LLVM inlining
3741
return Some(AttributeKind::NoInline.create_attr(cx.llcx));
3842
}
3943
match inline {
4044
InlineAttr::Hint => Some(AttributeKind::InlineHint.create_attr(cx.llcx)),
41-
InlineAttr::Always => Some(AttributeKind::AlwaysInline.create_attr(cx.llcx)),
45+
InlineAttr::Always => {
46+
if should_always_inline {
47+
Some(AttributeKind::AlwaysInline.create_attr(cx.llcx))
48+
} else {
49+
Some(llvm::CreateAttrStringValue(cx.llcx, "function-inline-cost", "0"))
50+
}
51+
}
4252
InlineAttr::Never => {
4353
if cx.sess().target.arch != "amdgpu" {
4454
Some(AttributeKind::NoInline.create_attr(cx.llcx))
@@ -47,6 +57,9 @@ fn inline_attr<'ll>(cx: &CodegenCx<'ll, '_>, inline: InlineAttr) -> Option<&'ll
4757
}
4858
}
4959
InlineAttr::None => None,
60+
InlineAttr::Usually => {
61+
Some(llvm::CreateAttrStringValue(cx.llcx, "function-inline-cost", "0"))
62+
}
5063
}
5164
}
5265

@@ -324,6 +337,27 @@ fn create_alloc_family_attr(llcx: &llvm::Context) -> &llvm::Attribute {
324337
llvm::CreateAttrStringValue(llcx, "alloc-family", "__rust_alloc")
325338
}
326339

340+
fn very_small_body(body: &rustc_middle::mir::Body<'_>) -> bool {
341+
use rustc_middle::mir::*;
342+
match body.basic_blocks.len() {
343+
0 => return true,
344+
1 => {}
345+
2.. => return false,
346+
}
347+
let block = &body.basic_blocks[START_BLOCK];
348+
match block.statements.len() {
349+
0 => {
350+
matches!(block.terminator().kind, TerminatorKind::Return)
351+
}
352+
1 => {
353+
let statement = &block.statements[0];
354+
matches!(statement.kind, StatementKind::Assign(_))
355+
&& matches!(block.terminator().kind, TerminatorKind::Return)
356+
}
357+
2.. => return false,
358+
}
359+
}
360+
327361
/// Helper for `FnAbi::apply_attrs_llfn`:
328362
/// Composite function which sets LLVM attributes for function depending on its AST (`#[attribute]`)
329363
/// attributes.
@@ -353,7 +387,15 @@ pub(crate) fn llfn_attrs_from_instance<'ll, 'tcx>(
353387
} else {
354388
codegen_fn_attrs.inline
355389
};
356-
to_add.extend(inline_attr(cx, inline));
390+
391+
let very_small_body = if cx.tcx.is_mir_available(instance.def_id()) {
392+
let body = cx.tcx.instance_mir(instance.def);
393+
very_small_body(body)
394+
} else {
395+
false
396+
};
397+
let should_always_inline = very_small_body || cx.tcx.sess.opts.optimize != OptLevel::No;
398+
to_add.extend(inline_attr(cx, inline, should_always_inline));
357399

358400
// The `uwtable` attribute according to LLVM is:
359401
//

compiler/rustc_codegen_ssa/src/codegen_attrs.rs

+2
Original file line numberDiff line numberDiff line change
@@ -528,6 +528,8 @@ fn codegen_fn_attrs(tcx: TyCtxt<'_>, did: LocalDefId) -> CodegenFnAttrs {
528528
InlineAttr::Always
529529
} else if list_contains_name(items, sym::never) {
530530
InlineAttr::Never
531+
} else if list_contains_name(items, sym::usually) {
532+
InlineAttr::Usually
531533
} else {
532534
struct_span_code_err!(tcx.dcx(), items[0].span(), E0535, "invalid argument")
533535
.with_help("valid inline arguments are `always` and `never`")

compiler/rustc_mir_transform/src/cross_crate_inline.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ fn cross_crate_inlinable(tcx: TyCtxt<'_>, def_id: LocalDefId) -> bool {
4646
// #[inline(never)] to force code generation.
4747
match codegen_fn_attrs.inline {
4848
InlineAttr::Never => return false,
49-
InlineAttr::Hint | InlineAttr::Always => return true,
49+
InlineAttr::Hint | InlineAttr::Always | InlineAttr::Usually => return true,
5050
_ => {}
5151
}
5252

compiler/rustc_mir_transform/src/inline.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ fn inline<'tcx>(tcx: TyCtxt<'tcx>, body: &mut Body<'tcx>) -> bool {
106106
changed: false,
107107
caller_is_inline_forwarder: matches!(
108108
codegen_fn_attrs.inline,
109-
InlineAttr::Hint | InlineAttr::Always
109+
InlineAttr::Hint | InlineAttr::Always | InlineAttr::Usually
110110
) && body_is_forwarder(body),
111111
};
112112
let blocks = START_BLOCK..body.basic_blocks.next_index();

compiler/rustc_span/src/symbol.rs

+1
Original file line numberDiff line numberDiff line change
@@ -2102,6 +2102,7 @@ symbols! {
21022102
usize_legacy_fn_max_value,
21032103
usize_legacy_fn_min_value,
21042104
usize_legacy_mod,
2105+
usually,
21052106
va_arg,
21062107
va_copy,
21072108
va_end,

0 commit comments

Comments
 (0)