-
Notifications
You must be signed in to change notification settings - Fork 13.6k
ARM: integrated assembler should replace "ldr rX, =imm" with "mov rX, #imm" if imm can be encoded as immediate #26096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Right now, the implementation is to create a temporary symbol, assign a .long with that expression and create a load to that label: ldr r4, .Ltmp0 Changing this to mov r4, #0 will need more than a pseudo. We'll need to:
I can see the benefit in terms of code size, general complexity of the resulting code (less/reduced constant pools) and speed (constant folding in the assembler). But I'm not convinced this is a necessary move. Why not just write: mov r4, #0 in the first place? What's the use case that makes this transformation necessary? "ldr r4, =0" doesn't seem to be one. Assuming there is one, and we decide to go for it, what's the preferred format? Always transforming a constant expression will be easier to document and keep closer to the least surprise principle, but no implementation of this will keep the principle as it is now, that's why I'm reluctant to just do it without a stronger example. |
Some additional points:
ldr r4, =.local .extern ext0
ldr r4, =10000
.thumb So the implementation here is simpler as it just needs to evaluateAsAsbsolute and be a constant MCExpr and fit into the move Immediate. However, GAS seems to be too conservative on the immediate, since the ranges are 255 for Thumb1, 65535 or ModImm for ARM and Thumb2). With additional complication from the ARMARM: "When both 32-bit encodings are available for an instruction, encoding T2 or A1 {ModImm} is preferred to encoding T3 or A2 {65k}". Questions that are left:
Given the extent of the support needed (a few constant values), the upstream complexity of keeping it sane, and the breakage of the principle of least surprise, I'm strongly biased to close this bug as WON'T FIX. |
I think it's fine to be more permissive than gas. Replace with an immediate load when possible, else don't. If you WontFix bugs like this, people will hack around clang's lack of support with stuff like https://mta.openssl.org/pipermail/openssl-commits/2015-April/000914.html This is an easy and obvious improvement. I wouldn't WontFix this. |
From a user point of view, "do what I want" is always preferred, and that's the reason GAS is so permissive and inconsistent. Linux kernel developers have done that to GCC for more than a decade, and at hindsight, neither kernel nor GCC developers like where things are now. There is no excuse to use a load to a constant when a mov would be identical and more semantically and syntactically correct. It's like asking to change the C standard just because some compiler in the past happened to allow you to write non-standard C code. Another example is MSVC, which was more than horrible in its implementation of C++, and in recent years have been moving towards full ISO C++ standard, investing millions of dollars to make it work as it should.
I can't control the decision that people take in their own compilers. We don't purposely support features that are not in the standard, not we purposely omit features that are in the ARM ARM. So, unless the #ifdef APPLE sequences (and I'm assuming this is Clang) can't be executed in GAS, there is no point in having the ifdef at all. Just implement in the standard way, and everything works fine. GAS is not magical. If you write an instruction that is not valid (because the immediate is too large, for example), the ARM core will NOT accept it, so GAS has to change that to a sequence of instructions, which will probably be similar to the #ifdef APPLE sequence anyway.
That's what some kernel developers used to say to GCC developers in the past, and got them where they are now. |
I'm not sure I agree with this. Surely having some syntax for "get me a constant, irrelevant of the range of that constant" is very useful - if the constant is hidden behind a macro or a user parameter for example (and movw/movt is not available). |
GAS doesn't support much, either, only numeric constants. And for those numeric that the range is not enough, it bails. This is really only a simple hack in GAS's implementation. |
I'm not sure what GAS supports is really relevant - we're not bound to gas's decisions. This is surely about what is the best/correct user behaviour, yes? |
It is absolutely relevant, because what's going on right now is that people want an assembler that is GAS-compatible more than an assembler that helps them write code. If we wrote a 100% bug-for-bug compatible assembler, people would love Clang. But that also meant we'd be chasing GAS for its bugs and obscure implementations and choices that are exclusively related to its internal implementations. This is not a sustainable scenario. Every GAS-compatibility bug we received came with the same premise: "this is a simple trick that helps developers", and "why are you complaining that much for such a silly thing". Look at our assembler right now and tell me it's not a mess already? I'll repeat the consensus on new assembly features / compatibility layers I wrote in Bug #24350: Our rule of thumb for implementing things not documented in the ARM ARM is:
This specific LDR case is only changed if the constant is small enough to fix a MOV immediate, and does not work with expressions of any kind, therefore, using a MOV is obviously correct and will work on GAS, too. In this case, there is a simple and obvious alternative, the mapping is not documented anywhere, the implementation will add more special rules to a long list of hacks and the usage doesn't seem that extensive. However, the add-on hack should be small enough that I won't oppose to you implementing it, as long as other people that deal with the assembler a lot (like Saleem, Jonathan, Tim and Arnaud) agree with you. |
The last point here is critical. The change required is correct across all assemblers and seems more immediately (get it -- immediate value, immediately) obvious what the intention is. One argument which could be made here is that this is a larger missing feature in our assembler. I don't think that we do strength reduction over the assembly before writing out the object file. I wouldn't be too opposed to someone implementing a strength reduction pass for MCAssembler. That would be able to perform the requested transformation and others to improve the generated code. I think I agree with Renato that this doesn't belong in the assembler itself. |
That would be a great feature, I think, and one where this kind of transformation would easily belong. There are two major views of what an assembler should do:
The first is very conservative, but works better on generated and inline assembly. That's because we don't want the assembler to hide code generation errors, nor we want inline assembly being fiddled with and breaking users' expectations. The second is great for hand written assembly. Right now, we have no distinction, and we try to behave conservatively, but not blind to obvious patterns and tricks. I'm in favour of a more clear division of expectations. Such strength reduction pass should be enabled via a flag, and not by default. |
I'd personally find it surprising if the assembler behaved like #2. IMO, it should take what I've written, and produce the exact bits implied by it. If there are cases where I could have written something else that would be faster, and the assembler knows about it, then telling me with a performance warning is ok, but swapping it out automatically without a flag to tell it to do so is not. That being said, isn't the ARM ARM very clear what we must emit here?
Sounds to me that we must emit a |
I agree with "being strict here is good long term" in theory. In practice, it means (as shown by the openssl commit I linked to above) that people are like "oh the assembler on iOS is strange, let me put in a few #if APPLE statements" (which don't work with clang on linux, etc). I understand that there are things that really shouldn't be supported, but converting |
This is not the ARM ARM, but the ARMCC documentation, on the same section as ADRL is documented. We don't claim compatibility with ARM tools, more so than with GNU tools. However, this case is simpler than ADRL, so I'm now ok with this one, since it is documented [1], and it's simple enough to not be a terrible hack (2 of the 4 points above). [1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801d/pge1425890292360.html |
Philosophically, this is bad behaviour from the user, not the compiler. If the tools help educate the users to be better programmers, than I personally think it's a good tool. OTOH, if they allow users to write whatever and accept garbage (ultimately getting it wrong for someone), I think it's a bad tool. Bad users might not like it, but I still think we have to keep our side of the deal.
Jonathan's argument is stronger than yours: it is documented by ARM, and it is simple, and really GAS is doing what's in the document. That's a good technical argument that this small feature could very well be implemented. Arguments like "GAS does that" or "it works in my program" are often proffered by developers that are used to this or that tool and are weak arguments. I apologise for not checking the ARMCC docs, which have historically fallen into our grey area for implementation. The GNU docs are an even greyer area, especially for ARM, so whatever is in there needs to be even simpler or more important to get in. I'll take a look at this translation, it should be reasonably simple, although slightly hacky, to get this in. At least now I have a document that will help me write tests. :) |
I had a go at this today, and here are my findings... The instruction sequence "ldr reg, =expr" matches the instruction LDRi12, which refers to atomic loads of 32-bit values (with modified immediate format). The expression is automatically converted to a temporary label during the match, and once MatchInstruction returns, it already exists. In the ARM back-end, the sequence is to match instructions first, then pseudo instructions (including Asm-specific ones), and because there is a match to an instruction, when it gets to the validation step, the instruction is already way off base. Therefore, adding an AsmPseudo makes no difference. A number of things could be done to implement this transformation:
Overall, since the implementation is not as simple as I imagined, and the request can easily be solved in a way that works on both assemblers (ie. use a mov directly), I don't see an alternative other than ignore this request. I won't mark as "won't fix", because I'm still ok with someone having a go at this on their own. If someone finds a better way that is simple and quick, I'm ok with the change, but I see no reason to continue trying myself. |
I've put in a fix that has been committed upstream. ldr rt, =expr will be transformed into mov rt, expr when expr can be evaluated to a constant and the operands can be represented by an available mov instruction. It does not support fixups so if expr can't be evaluated to a constant at the time ldr rt, =expr is encountered, but can be later in the file then a constant pool will still be generated. http://reviews.llvm.org/D20153 committed with revisions: |
Extended Description
gas does that, it produces smaller code, and it prevents follow-on "out of range pc-relative fixup value" errors if the ldr is in a large-ish code block and the offset to the constant pool is too large.
Example (reduced from boringssl):
thakis@thakis:~/src/chrome/src$ head arm.S
.fpu neon
.text
.align 4
.global foo
.hidden foo
.type foo, %function
ldr r4, =0
mov r1, r2
mov r1, r2
/* repeat previous instruction 2000 times here */
thakis@thakis:
/src/chrome/src$ third_party/llvm-build/Release+Asserts/bin/clang -c arm.S -march=armv7-a -target arm-linux-androideab/src/chrome/src$ third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.8/prebuilt/linux-x86_64/bin/arm-linux-androideabi-gcc -c arm.Sarm.S:8:1: error: out of range pc-relative fixup value
ldr r4, =0
^
thakis@thakis:
thakis@thakis:~/src/chrome/src$ third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.8/prebuilt/linux-x86_64/bin/arm-linux-androideabi-objdump -D -marm arm.o | head
arm.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <.text>:
0: e3a04000 mov r4, #0
4: e1a01002 mov r1, r2
8: e1a01002 mov r1, r2
The text was updated successfully, but these errors were encountered: