-
Notifications
You must be signed in to change notification settings - Fork 13.5k
llc lowers intrinsics incorrectly with the presense of regparm attribute #4369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
running command: |
llc is doing correct thing here. The intrinsic does not have any regparm attributes. |
So is it possible to leverage the benefits of fastcall/regparm for these intrinsic? OS Kernel specializes the memcpy call for performance reasons. |
The problem here is llvm-gcc is not respecting either the -ffreestanding or -fno-builtin flags by generating an llvm intrinsic for memcpy. The definition in the C file (with the regparm flag) no longer matches the intrinsic and should not be lowered to an intrinsic in the IR. |
Neither does gcc. In fact the gcc docs say "GCC requires the PS: This has come up before, so probably you can find previous |
gcc is generating calls with the regparm=3 calling convention to the programmer supplied memcpy. llvm is generating calls with the normal calling convention to the programmer supplied memcpy (which is regparm=3). I will grant that this is a 1 line change in the source program in this case to deal with this difference, but it doesn't seem like a satisfactory solution. |
As noted in the gcc docs, gcc requires the system to supply memcpy. That said, what would it take to change this? The problem is |
Well, gcc 4.3 generates "rep; movsb" assembly for struct assignments instead of calls to memcpy (as least on my machine). This bug only shows up when you are compiling performance critical code. In fact, it makes llvm-gcc fail to miscompile Linux kernel. |
llvm-gcc will also emit this kind of code (rep; movsb) if it By the way, presumably the linux kernel comment is because |
linux kernel includes its only implementation of memcpy/memset in its tree, which are compiled with -mregparm=3 |
The simple correction to linux is to tag the function with an attribute, or, and this is what I did in my tree, make memcpy a (regparm=0) wrapper for kmemcpy and have explicit users use kmemcpy. This takes 4 lines of code. |
The compiler was invoked with -ffreestanding -mregparm=3. The memcpy in this PR is, by that definition, the standard memcpy.
That does not appear to be the case (using -mtune=i686 makes it less inclined to just use rep movs, btw): Try compiling the C file in (duplicate) bug 18415, as follows: Using GCC 4.8.2 on Fedora 20, it definitely seems to invoke 'memcpy' with the correct ABI, as specified on the command line. |
*** Bug llvm/llvm-bugzilla-archive#18415 has been marked as a duplicate of this bug. *** |
Rough sketch of how you might solve this... This is likely a decent starting point to factor and use for memcpy, memmove, etc. However, there may be better/cleaner ways of doing this. Also, while it correctly lowers using the ABI, it doesn't realize this makes the function tail callable. That might require brute force of looking at the register usage and synthesizing at tail call when suitable here in the DAG. Or it might require some cleanup when doing isel on the dag. :: shudder :: |
Hm, Perhaps I misunderstand but I think that patch misses the point. However, it does enlighten me somewhat, so thanks. The point is not that we have an existing memcpy() function in this module, declared with some bizarreness like attribute((regparm(3)). The point is that we have invoked the compiler with -mregparm=3 so everything is using that ABI. I don't think this patch will address that, will it? I'm a little confused because -mregparm=3 doesn't seem to match any of the predefined calling conventions enumerated by CallingConv::ID. How are we even supposed to ask for what we want? On the clang side, we have X86_32ABIInfo::shouldUseInReg() which will calculate, for each argument, whether it's supposed to be in a register or not. On the LLVM side we don't have that flexibility. Or at least not in the same way. One simple(ish) solution might be to define CallingConv::ID::RegParm1, CallingConv::ID::RegParm2, CallingConv::ID::RegParm3 etc., and let each target implement them as appropriate. Is there a better way? |
Heh, I was actually discussing this other possibility just now.
The current way this is modeled is relatively simple: the Clang ABI information attaches 'inreg' onto the relevant parameters of the function declaration. This, combined with the baseline calling convention, provides the information needed to lower the call correctly. The problem is that we currently rely exclusively on function declarations to encode this type of information. If we don't have the declaration there is no mechanism to communicate this from the frontend (which has the complete ABI information) to the backend which is doing the lowering. All of the options I see for this are pretty bad.
3b) Same as 3, but make certain declarations optional when we can emit straight line code (memcpy can always be emitted as a tiny rep loop).
Sadly, the biggest disadvantage to #3 are huge swath of math functions that we would have to do this for, not memcpy and friends, so #3b seems to not help the complexity much. #1 and #2 work, but seem likely to be brittle and embed very nitty-gritty ABI details even farther into the code generator. |
bump. We still see this issue in arch/x86/boot (memcpy) . |
How about callback from the code generator to the front end, giving the API of an intrinsic function that it's about to emit a call to, and allowing the front end to tweak it as appropriate (by adding 'inreg', etc.). |
This is also going to come up - very soon - in the context of the MCU psABI as well, since that ABI basically mandates -mregparm=3. |
Fixed in r298179. |
mentioned in issue llvm/llvm-bugzilla-archive#4068 |
Extended Description
llc does not handle this case correctly:
when llvm.memcpy.i32 lowers to call to memcpy, it does not follow the customized memcpy calling convention.
for the first llvm.memcpy.i32, the generated assemblies are:
for the second call of memcpy, the codes are:
The text was updated successfully, but these errors were encountered: