Skip to content

Don't alloca just to look at a discriminant #138391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 14, 2025

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Mar 12, 2025

Today we're making LLVM do a bunch of extra work when you match on trivial stuff like Option<bool> or ControlFlow<u8>.

This PR changes that so that simple types like Option<u32> or Result<(), Box<Error>> can stay as OperandValue::ScalarPair and we can still read the discriminant from them, rather than needing to write them into memory to have a PlaceValue just to get the discriminant out.

Fixes #137503

Today we're making LLVM do a bunch of extra work for every enum you match on, even trivial stuff like `Option<bool>`.  Let's not.
@rustbot
Copy link
Collaborator

rustbot commented Mar 12, 2025

r? @lcnr

rustbot has assigned @lcnr.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 12, 2025
@rustbot
Copy link
Collaborator

rustbot commented Mar 12, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Mar 12, 2025
Comment on lines -33 to +34
// CHECK-LABEL: @four_valued
// CHECK-LABEL: i16 @four_valued(i16{{.+}}%x)
// CHECK-NEXT: {{^.*:$}}
// CHECK-NEXT: ret i16 %0
// CHECK-NEXT: ret i16 %x
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

annot: these used to have the parameter named %0 because it was written into an %x alloca, but now that that's no longer needed, we give the %x name to the parameter directly.

Comment on lines +452 to +461
let tag_op = match self.val {
OperandValue::ZeroSized => bug!(),
OperandValue::Immediate(_) | OperandValue::Pair(_, _) => {
self.extract_field(fx, bx, tag_field)
}
OperandValue::Ref(place) => {
let tag = place.with_type(self.layout).project_field(bx, tag_field);
bx.load_operand(tag)
}
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codegen_get_discr method is moved nearly unchanged from PlaceRef, just now we have the extra case to extract_field if it's immediate(s). If it's a Ref, though, we do the same project_field as before.

@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 12, 2025
Don't `alloca` just to look at a discriminant

Today we're making LLVM do a bunch of extra work when you match on trivial stuff like `Option<bool>` or `ControlFlow<u8>`.

This PR changes that so that simple types like `Option<u32>` or `Result<(), Box<Error>>` can stay as `OperandValue::ScalarPair` and we can still read the discriminant from them, rather than needing to write them into memory to have a `PlaceValue` just to get the discriminant out.

Fixes rust-lang#137503
@bors
Copy link
Collaborator

bors commented Mar 12, 2025

⌛ Trying commit 143f393 with merge 3cda1f4...

@lcnr
Copy link
Contributor

lcnr commented Mar 12, 2025

r? codegen

@rustbot rustbot assigned workingjubilee and unassigned lcnr Mar 12, 2025
@bors
Copy link
Collaborator

bors commented Mar 12, 2025

☀️ Try build successful - checks-actions
Build commit: 3cda1f4 (3cda1f4b257a0fdcc0b8279efce522cebd5263c5)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3cda1f4): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.8% [-0.8%, -0.8%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.8% [-0.8%, -0.8%] 1

Max RSS (memory usage)

Results (primary -2.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.2% [-2.2%, -2.2%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.2% [-2.2%, -2.2%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary -0.0%, secondary -0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
-0.0% [-0.1%, -0.0%] 27
Improvements ✅
(secondary)
-0.0% [-0.1%, -0.0%] 10
All ❌✅ (primary) -0.0% [-0.1%, -0.0%] 27

Bootstrap: 779.599s -> 779.273s (-0.04%)
Artifact size: 365.29 MiB -> 365.24 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2025
Comment on lines -328 to -334
sym::discriminant_value => {
if ret_ty.is_integral() {
args[0].deref(bx.cx()).codegen_get_discr(bx, ret_ty)
} else {
span_bug!(span, "Invalid discriminant type for `{:?}`", arg_tys[0])
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annot: this is dead code, since LowerIntrinsics turns all the calls to this into MIR.

(And by deleting it there was only the one caller of codegen_get_discr to update.)

@scottmcm
Copy link
Member Author

TBH I'd hoped for some icount improvements from doing less work, but oh well. I'll take no regressions and a debug-mode code size improvement, given how small of a code change it took.

@rustbot ready

return bx.cx().const_poison(cast_to);
}
let (tag_scalar, tag_encoding, tag_field) = match self.layout.variants {
Variants::Empty => unreachable!("we already handled uninhabited types"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can we just move the poison return here, instead of doing the if? Or is there a case where is_uninhabited is true, but .variants is not Empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering that myself, TBH, but wasn't sure so left it as it was.

Exploring a bit more now (and thinking of the whole "is (i32, !) a ZST?" issue), it looks like yes it's possible: https://rust.godbolt.org/z/sjq1hEs8r

pub enum Foo<T> {
    Hmm(i32, T)
}
pub type Bar = Foo<std::convert::Infallible>;

That Bar has

           uninhabited: true,
           variants: Single {
               index: 0,
           },

so it's uninhabited but still single-variant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// A variant is absent if it's uninhabited and only has ZST fields.
// Present uninhabited variants only require space for their fields,
// but *not* an encoding of the discriminant (e.g., a tag value).
// See issue #49298 for more details on the need to leave space
// for non-ZST uninhabited data (mostly partial initialization).

Comment on lines 438 to 443
if let Some(discr) = self.layout.ty.discriminant_for_variant(bx.tcx(), index) {
discr.val
} else {
assert_eq!(index, FIRST_VARIANT);
0
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why is this behavior not embedded in discriminant_for_variant itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On an ADT it doesn't have the option return: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html#method.discriminant_for_variant

On Ty, though, it returns None for things that aren't enums nor Coroutines. From a quick scan there appear to be a bunch of places unwraping or ?ing it, presumably as a "look, this is a type where I don't actually want to look at the discriminant in the first place". (But mem::discriminant work on arbitrary stuff, even i32, so stable can get to checking a discriminant, especially in debug where the MIR is polymorphic.)

So it's not obvious to me that changing the Ty version to default to just giving a zero instead of a None would be a good thing.

(I changed it from what existed before on Place because it used to be using the index.as_u32() rather than zero, but I can't see any way that it would ever be possible to have a type with a non-zero VariantIdx reach this arm, so wanted to upgrade that to an ICE if somehow it ever did since that's probably a mistake -- after all, returning the VariantIdx as a discriminant is the wrong thing to do for enums too.)

@WaffleLapkin
Copy link
Member

r? WaffleLapkin
@bors r+

@bors
Copy link
Collaborator

bors commented Mar 13, 2025

📌 Commit 2b15dd1 has been approved by WaffleLapkin

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 13, 2025
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Mar 13, 2025
@bors
Copy link
Collaborator

bors commented Mar 14, 2025

⌛ Testing commit 2b15dd1 with merge addae07...

@bors
Copy link
Collaborator

bors commented Mar 14, 2025

☀️ Test successful - checks-actions
Approved by: WaffleLapkin
Pushing addae07 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 14, 2025
@bors bors merged commit addae07 into rust-lang:master Mar 14, 2025
7 checks passed
@rustbot rustbot added this to the 1.87.0 milestone Mar 14, 2025
Copy link

Post-merge analysis result

Test differences

No test diffs found

@scottmcm scottmcm deleted the SSA-discriminants branch March 14, 2025 05:32
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (addae07): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-0.7%, -0.7%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.7% [-0.7%, -0.7%] 1

Max RSS (memory usage)

Results (primary -2.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.6% [-2.6%, -2.6%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.6% [-2.6%, -2.6%] 1

Cycles

Results (secondary 4.6%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.6% [3.2%, 5.9%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

Results (primary -0.0%, secondary -0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 1
Improvements ✅
(primary)
-0.0% [-0.1%, -0.0%] 27
Improvements ✅
(secondary)
-0.0% [-0.1%, -0.0%] 11
All ❌✅ (primary) -0.0% [-0.1%, -0.0%] 27

Bootstrap: 774.314s -> 774.074s (-0.03%)
Artifact size: 365.05 MiB -> 364.97 MiB (-0.02%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stop forcing an alloc just because something looked at the discriminant
8 participants