-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Simplify implementation of Rust intrinsics by using type parameters in the cache #142259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Some changes occurred in compiler/rustc_codegen_ssa Some changes occurred in compiler/rustc_codegen_gcc |
This seems like it might be perf-sensitive. @bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Simplify implementation of Rust intrinsics by using type parameters in the cache The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function. (originally was part of #140763, split off later) `@rustbot` label A-codegen A-LLVM r? codegen
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (57fad72): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)Results (primary 1.7%, secondary 2.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -2.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 754.509s -> 753.616s (-0.12%) |
@workingjubilee seems like there is no perf impact! (I am surprised actually) |
neat, I will try to take a closer look later but this is a very good cleanup so I expect to be approving it later today. |
@@ -861,372 +869,156 @@ impl<'ll> CodegenCx<'ll, '_> { | |||
} else { | |||
self.type_variadic_func(&[], ret) | |||
}; | |||
let f = self.declare_cfn(name, llvm::UnnamedAddr::No, fn_ty); | |||
self.intrinsics.borrow_mut().insert(name, (fn_ty, f)); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we are still creating the intrinsic function type based on the argument and return value types -- however, the signature is already uniquely determined by the base_name and type_params.
I think it would be a lot better to use LLVMGetIntrinsicDeclaration
accepting the ID and type parameters. This means that we no longer needs args and ret -- and crucially, this means we don't need to maintain the list of intrinsics inside declare_intrinsic anymore!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do know about Intrinsic::getDeclaration
. The reason I didn't use this is performance. The problem with the C API function LLVMGetIntrinsicDeclaration
is that it computes the FunctionType
(because Intrinsic::getDeclaration
does), and then throws it away (It calls getCallee
on the FunctionCallee
object returned by Intrinsic::getDeclaration
). So in total, I have to compute the FunctionType
twice, which is pretty expensive. So I just duplicated the implementation of getDeclaration
, but with a known FunctionType
.
The reason I didn't get rid of the list of intrinsics is also performance - Intrinsic::getType
calls are expensive, and calling that every time we are comparing 2 integers seems like a major perf issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that nikic's proposed change can be implemented on top of this simplified file. I think we should merge this because it's already better, but I would indeed like to see an even simpler version and we can run perf on it to see if it affects anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So are you suggesting to completely remove the cache? Or should I keep a cache like FxHashMap<(String, SmallVec<[&'ll Type; 2]>), (&'ll Type, &'ll Value)>
that will dynamically be filled whenever an intrinsic is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another PR with this as a base, then we can have a nice chat about where to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do know about
Intrinsic::getDeclaration
. The reason I didn't use this is performance. The problem with the C API functionLLVMGetIntrinsicDeclaration
is that it computes theFunctionType
(becauseIntrinsic::getDeclaration
does), and then throws it away (It callsgetCallee
on theFunctionCallee
object returned byIntrinsic::getDeclaration
). So in total, I have to compute theFunctionType
twice, which is pretty expensive. So I just duplicated the implementation ofgetDeclaration
, but with a knownFunctionType
.
Intrinsic::getDeclaration() does not discard the FunctionType. The FunctionType is part of the Function. You can fetch it using get_type_of_global().
The reason I didn't get rid of the list of intrinsics is also performance -
Intrinsic::getType
calls are expensive, and calling that every time we are comparing 2 integers seems like a major perf issue.
I don't really follow how having a list of intrinsics makes things faster. Doesn't going through the list and matching all the names just add cost? Besides, you are caching the result anyway, so whatever you do, it will only happen once per intrinsic + type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic sorry I misinterpreted your comment as a suggestion to completely remove the cache. Yes, that makes sense. I am holding it off for this PR, let's just have another PR for it as @workingjubilee suggested
ea453f7
to
d56fcd9
Compare
@bors r+ |
The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the base name and the type parameters in the cache instead of the full, overloaded name. This has the benefit that
call_intrinsic
doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This usesLLVMIntrinsicCopyOverloadedName2
to get the overloaded name from the base name and the type parameters, and only uses it to declare the function.(originally was part of #140763, split off later)
@rustbot label A-codegen A-LLVM
r? codegen