Skip to content

Simplify implementation of Rust intrinsics by using type parameters in the cache #142259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sayantn
Copy link
Contributor

@sayantn sayantn commented Jun 9, 2025

The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the base name and the type parameters in the cache instead of the full, overloaded name. This has the benefit that call_intrinsic doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses LLVMIntrinsicCopyOverloadedName2 to get the overloaded name from the base name and the type parameters, and only uses it to declare the function.

(originally was part of #140763, split off later)

@rustbot label A-codegen A-LLVM
r? codegen

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 9, 2025
@rustbot
Copy link
Collaborator

rustbot commented Jun 9, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

@rustbot rustbot added the A-codegen Area: Code generation label Jun 9, 2025
@workingjubilee
Copy link
Member

This seems like it might be perf-sensitive.

@bors2 try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Jun 9, 2025

⌛ Trying commit ea453f7 with merge 57fad72

To cancel the try build, run the command @bors2 try cancel.

rust-bors bot added a commit that referenced this pull request Jun 9, 2025
Simplify implementation of Rust intrinsics by using type parameters in the cache

The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function.

(originally was part of #140763, split off later)

`@rustbot` label A-codegen A-LLVM
r? codegen
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 9, 2025
@rust-bors
Copy link

rust-bors bot commented Jun 10, 2025

☀️ Try build successful (CI)
Build commit: 57fad72 (57fad72009e5865a872ba3d86eccf0f3a1917f99)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (57fad72): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 1.7%, secondary 2.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.7% [1.7%, 1.7%] 1
Regressions ❌
(secondary)
2.9% [2.3%, 3.5%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.7% [1.7%, 1.7%] 1

Cycles

Results (secondary -2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.3% [-2.3%, -2.3%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 754.509s -> 753.616s (-0.12%)
Artifact size: 372.30 MiB -> 372.29 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 10, 2025
@sayantn
Copy link
Contributor Author

sayantn commented Jun 10, 2025

@workingjubilee seems like there is no perf impact! (I am surprised actually)

@workingjubilee
Copy link
Member

neat, I will try to take a closer look later but this is a very good cleanup so I expect to be approving it later today.

@@ -861,372 +869,156 @@ impl<'ll> CodegenCx<'ll, '_> {
} else {
self.type_variadic_func(&[], ret)
};
let f = self.declare_cfn(name, llvm::UnnamedAddr::No, fn_ty);
self.intrinsics.borrow_mut().insert(name, (fn_ty, f));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we are still creating the intrinsic function type based on the argument and return value types -- however, the signature is already uniquely determined by the base_name and type_params.

I think it would be a lot better to use LLVMGetIntrinsicDeclaration accepting the ID and type parameters. This means that we no longer needs args and ret -- and crucially, this means we don't need to maintain the list of intrinsics inside declare_intrinsic anymore!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do know about Intrinsic::getDeclaration. The reason I didn't use this is performance. The problem with the C API function LLVMGetIntrinsicDeclaration is that it computes the FunctionType (because Intrinsic::getDeclaration does), and then throws it away (It calls getCallee on the FunctionCallee object returned by Intrinsic::getDeclaration). So in total, I have to compute the FunctionType twice, which is pretty expensive. So I just duplicated the implementation of getDeclaration, but with a known FunctionType.

The reason I didn't get rid of the list of intrinsics is also performance - Intrinsic::getType calls are expensive, and calling that every time we are comparing 2 integers seems like a major perf issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that nikic's proposed change can be implemented on top of this simplified file. I think we should merge this because it's already better, but I would indeed like to see an even simpler version and we can run perf on it to see if it affects anything.

Copy link
Contributor Author

@sayantn sayantn Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you suggesting to completely remove the cache? Or should I keep a cache like FxHashMap<(String, SmallVec<[&'ll Type; 2]>), (&'ll Type, &'ll Value)> that will dynamically be filled whenever an intrinsic is called?

Copy link
Member

@workingjubilee workingjubilee Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another PR with this as a base, then we can have a nice chat about where to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do know about Intrinsic::getDeclaration. The reason I didn't use this is performance. The problem with the C API function LLVMGetIntrinsicDeclaration is that it computes the FunctionType (because Intrinsic::getDeclaration does), and then throws it away (It calls getCallee on the FunctionCallee object returned by Intrinsic::getDeclaration). So in total, I have to compute the FunctionType twice, which is pretty expensive. So I just duplicated the implementation of getDeclaration, but with a known FunctionType.

Intrinsic::getDeclaration() does not discard the FunctionType. The FunctionType is part of the Function. You can fetch it using get_type_of_global().

The reason I didn't get rid of the list of intrinsics is also performance - Intrinsic::getType calls are expensive, and calling that every time we are comparing 2 integers seems like a major perf issue.

I don't really follow how having a list of intrinsics makes things faster. Doesn't going through the list and matching all the names just add cost? Besides, you are caching the result anyway, so whatever you do, it will only happen once per intrinsic + type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic sorry I misinterpreted your comment as a suggestion to completely remove the cache. Yes, that makes sense. I am holding it off for this PR, let's just have another PR for it as @workingjubilee suggested

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 11, 2025
@sayantn sayantn force-pushed the simplify-intrinsics branch from ea453f7 to d56fcd9 Compare June 11, 2025 19:03
@sayantn sayantn requested a review from workingjubilee June 11, 2025 19:06
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 11, 2025
@workingjubilee
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Jun 13, 2025

📌 Commit d56fcd9 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 13, 2025
@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants