-
-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Large amount of generated code for match statements with large arrays #110870
Copy link
Copy link
Open
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-arrayArea: `[T; N]`Area: `[T; N]`A-patternsRelating to patterns and pattern matchingRelating to patterns and pattern matchingC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.Issue: Problems and improvements with respect to binary size of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-MIRArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlArea: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.htmlA-arrayArea: `[T; N]`Area: `[T; N]`A-patternsRelating to patterns and pattern matchingRelating to patterns and pattern matchingC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.Issue: Problems and improvements with respect to binary size of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Consider the following code, which matches on a 64-element array:
When built with
rustc --crate-type=lib t.rs --emit=llvm-irthis emits 64 basic blocks, each doing a comparison. At-O, the optimizer is able to reduce it to 64 single-byte loads, which are OR'd together. Finally, at-C opt-level=3, the optimizer is able to put the whole thing back together into a singleicmp ne <64 x i8>.The large number of basic blocks are visible in the MIR, which leads me to believe that all the optimizations here are being accomplished by LLVM.
Generating this large number of basic blocks is inefficient, and puts a lot of unnecessary pressure on LLVM to optimize things. It would be much better if either an integer comparison, or even a call to
memcmp, was emitted, as it could simple be lowered as LLVM wished. This is how normal, non-match, comparisons with arrays of primitives are handled (see https://github.com/rust-lang/rust/blob/master/library/core/src/array/equality.rs#L146-L152 and https://github.com/rust-lang/rust/blob/master/compiler/rustc_codegen_llvm/src/intrinsic.rs#L298-L336).This whole thing is extracted and minimized from https://github.com/alex/rust-asn1/blob/main/src/object_identifier.rs#L5-L24, which is used in many match statements in https://github.com/pyca/cryptography/tree/main/src/rust