Skip to content

ARM: integrated assembler should replace "ldr rX, =imm" with "mov rX, #imm" if imm can be encoded as immediate #26096

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nico opened this issue Dec 2, 2015 · 16 comments
Labels
backend:ARM bugzilla Issues migrated from bugzilla

Comments

@nico
Copy link
Contributor

nico commented Dec 2, 2015

Bugzilla Link 25722
Resolution FIXED
Resolved on May 13, 2016 11:10
Version trunk
OS Linux
Blocks #20796
CC @Arnaud-de-Grandmaison-ARM,@compnerd,@jmolloy,@kbeyls,@rengolin,@smithp35,@TNorthover

Extended Description

gas does that, it produces smaller code, and it prevents follow-on "out of range pc-relative fixup value" errors if the ldr is in a large-ish code block and the offset to the constant pool is too large.

Example (reduced from boringssl):

thakis@thakis:~/src/chrome/src$ head arm.S
.fpu neon
.text
.align 4
.global foo
.hidden foo
.type foo, %function

ldr r4, =0
mov r1, r2
mov r1, r2
/* repeat previous instruction 2000 times here */

thakis@thakis:/src/chrome/src$ third_party/llvm-build/Release+Asserts/bin/clang -c arm.S -march=armv7-a -target arm-linux-androideab
arm.S:8:1: error: out of range pc-relative fixup value
ldr r4, =0
^
thakis@thakis:
/src/chrome/src$ third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.8/prebuilt/linux-x86_64/bin/arm-linux-androideabi-gcc -c arm.S
thakis@thakis:~/src/chrome/src$ third_party/android_tools/ndk/toolchains/arm-linux-androideabi-4.8/prebuilt/linux-x86_64/bin/arm-linux-androideabi-objdump -D -marm arm.o | head

arm.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <.text>:
0: e3a04000 mov r4, #​0
4: e1a01002 mov r1, r2
8: e1a01002 mov r1, r2

@rengolin
Copy link
Member

rengolin commented Dec 7, 2015

Right now, the implementation is to create a temporary symbol, assign a .long with that expression and create a load to that label:

ldr r4, .Ltmp0
...
.Ltmp0:
.long 0

Changing this to mov r4, #​0 will need more than a pseudo. We'll need to:

  1. Identify a constant expression
  2. Make sure that constant fits in the mov immediate
  3. Translate from ldr to mov

I can see the benefit in terms of code size, general complexity of the resulting code (less/reduced constant pools) and speed (constant folding in the assembler).

But I'm not convinced this is a necessary move. Why not just write:

mov r4, #​0

in the first place?

What's the use case that makes this transformation necessary? "ldr r4, =0" doesn't seem to be one.

Assuming there is one, and we decide to go for it, what's the preferred format?

Always transforming a constant expression will be easier to document and keep closer to the least surprise principle, but no implementation of this will keep the principle as it is now, that's why I'm reluctant to just do it without a stronger example.

@rengolin
Copy link
Member

rengolin commented Dec 9, 2015

Some additional points:

  1. Not even GAS likes symbols, so this transformation can't be too complex.

ldr r4, =.local
.local:
.word 1984
mov r0, r0
...
Error: invalid literal constant: pool needs to be closer

.extern ext0
.extern ext1
.hidden ext0
.hidden ext1
ldr r0, =.ext0
Error: invalid literal constant: pool needs to be closer
ldr r0, =.ext0-.ext1
Error: syntax error -- `ldr r4,=.ext0-.ext1'

  1. As expected, in GAS, LDRs with expressions that can be encoded as MOVs pass silently, but when they can't, the error is similar to LLVM's:

ldr r4, =10000
Error: invalid literal constant: pool needs to be closer

  1. In Thumb, the error messages are totally different and very confusing:

.thumb
ldr r4, =256
Error: invalid offset, value too big (0x00000FA4)

So the implementation here is simpler as it just needs to evaluateAsAsbsolute and be a constant MCExpr and fit into the move Immediate.

However, GAS seems to be too conservative on the immediate, since the ranges are 255 for Thumb1, 65535 or ModImm for ARM and Thumb2). With additional complication from the ARMARM: "When both 32-bit encodings are available for an instruction, encoding T2 or A1 {ModImm} is preferred to encoding T3 or A2 {65k}".

Questions that are left:

  1. Do we emulate GAS bug-for-bug? Or do we expand to cover all possible immediate encodings?
  2. If the former, do we keep an eye whenever GAS start to support more ranges? This doesn't scale, not I think we want to be in that situation.
  3. If the latter, we'll probably create other issues by doing this, and the axe will fall on us, not GAS, and we'll have to emulate GAS bug-for-bug.

Given the extent of the support needed (a few constant values), the upstream complexity of keeping it sane, and the breakage of the principle of least surprise, I'm strongly biased to close this bug as WON'T FIX.

@nico
Copy link
Contributor Author

nico commented Dec 12, 2015

I think it's fine to be more permissive than gas. Replace with an immediate load when possible, else don't.

If you WontFix bugs like this, people will hack around clang's lack of support with stuff like https://mta.openssl.org/pipermail/openssl-commits/2015-April/000914.html

This is an easy and obvious improvement. I wouldn't WontFix this.

@rengolin
Copy link
Member

I think it's fine to be more permissive than gas. Replace with an immediate
load when possible, else don't.

From a user point of view, "do what I want" is always preferred, and that's the reason GAS is so permissive and inconsistent. Linux kernel developers have done that to GCC for more than a decade, and at hindsight, neither kernel nor GCC developers like where things are now.

There is no excuse to use a load to a constant when a mov would be identical and more semantically and syntactically correct. It's like asking to change the C standard just because some compiler in the past happened to allow you to write non-standard C code.

Another example is MSVC, which was more than horrible in its implementation of C++, and in recent years have been moving towards full ISO C++ standard, investing millions of dollars to make it work as it should.

If you WontFix bugs like this, people will hack around clang's lack of
support with stuff like
https://mta.openssl.org/pipermail/openssl-commits/2015-April/000914.html

I can't control the decision that people take in their own compilers. We don't purposely support features that are not in the standard, not we purposely omit features that are in the ARM ARM.

So, unless the #ifdef APPLE sequences (and I'm assuming this is Clang) can't be executed in GAS, there is no point in having the ifdef at all. Just implement in the standard way, and everything works fine.

GAS is not magical. If you write an instruction that is not valid (because the immediate is too large, for example), the ARM core will NOT accept it, so GAS has to change that to a sequence of instructions, which will probably be similar to the #ifdef APPLE sequence anyway.

This is an easy and obvious improvement. I wouldn't WontFix this.

That's what some kernel developers used to say to GCC developers in the past, and got them where they are now.

@jmolloy
Copy link

jmolloy commented Dec 12, 2015

There is no excuse to use a load to a constant when a mov would be identical and more semantically and syntactically correct.

I'm not sure I agree with this. Surely having some syntax for "get me a constant, irrelevant of the range of that constant" is very useful - if the constant is hidden behind a macro or a user parameter for example (and movw/movt is not available).

@rengolin
Copy link
Member

I'm not sure I agree with this. Surely having some syntax for "get me a
constant, irrelevant of the range of that constant" is very useful - if the
constant is hidden behind a macro or a user parameter for example (and
movw/movt is not available).

GAS doesn't support much, either, only numeric constants. And for those numeric that the range is not enough, it bails. This is really only a simple hack in GAS's implementation.

@jmolloy
Copy link

jmolloy commented Dec 12, 2015

I'm not sure what GAS supports is really relevant - we're not bound to gas's decisions. This is surely about what is the best/correct user behaviour, yes?

@rengolin
Copy link
Member

I'm not sure what GAS supports is really relevant - we're not bound to gas's
decisions. This is surely about what is the best/correct user behaviour, yes?

It is absolutely relevant, because what's going on right now is that people want an assembler that is GAS-compatible more than an assembler that helps them write code.

If we wrote a 100% bug-for-bug compatible assembler, people would love Clang. But that also meant we'd be chasing GAS for its bugs and obscure implementations and choices that are exclusively related to its internal implementations. This is not a sustainable scenario.

Every GAS-compatibility bug we received came with the same premise: "this is a simple trick that helps developers", and "why are you complaining that much for such a silly thing". Look at our assembler right now and tell me it's not a mess already?

I'll repeat the consensus on new assembly features / compatibility layers I wrote in Bug #​24350:

Our rule of thumb for implementing things not documented in the ARM ARM is:

  • If there are no alternatives in UAL / ARM ARM.
  • If the mapping is simple and clear, documented elsewhere
  • Implementation won't break or make code harder to understand/maintain
  • If the usage in the wild is extremely extensive and well known

This specific LDR case is only changed if the constant is small enough to fix a MOV immediate, and does not work with expressions of any kind, therefore, using a MOV is obviously correct and will work on GAS, too.

In this case, there is a simple and obvious alternative, the mapping is not documented anywhere, the implementation will add more special rules to a long list of hacks and the usage doesn't seem that extensive.

However, the add-on hack should be small enough that I won't oppose to you implementing it, as long as other people that deal with the assembler a lot (like Saleem, Jonathan, Tim and Arnaud) agree with you.

@compnerd
Copy link
Member

This specific LDR case is only changed if the constant is small enough to
fix a MOV immediate, and does not work with expressions of any kind,
therefore, using a MOV is obviously correct and will work on GAS, too.

The last point here is critical. The change required is correct across all assemblers and seems more immediately (get it -- immediate value, immediately) obvious what the intention is.

One argument which could be made here is that this is a larger missing feature in our assembler. I don't think that we do strength reduction over the assembly before writing out the object file.

I wouldn't be too opposed to someone implementing a strength reduction pass for MCAssembler. That would be able to perform the requested transformation and others to improve the generated code.

I think I agree with Renato that this doesn't belong in the assembler itself.

@rengolin
Copy link
Member

One argument which could be made here is that this is a larger missing
feature in our assembler. I don't think that we do strength reduction over
the assembly before writing out the object file.

That would be a great feature, I think, and one where this kind of transformation would easily belong.

There are two major views of what an assembler should do:

  1. Understand, warn and correct user errors, but keeping the results as close to the original source as possible, as to strictly follow the principle of least surprise.

  2. Behave like a compiler, optimise user code and be "smart" about ISA tricks that can be applied to make the code faster, smaller, etc.

The first is very conservative, but works better on generated and inline assembly. That's because we don't want the assembler to hide code generation errors, nor we want inline assembly being fiddled with and breaking users' expectations.

The second is great for hand written assembly.

Right now, we have no distinction, and we try to behave conservatively, but not blind to obvious patterns and tricks. I'm in favour of a more clear division of expectations. Such strength reduction pass should be enabled via a flag, and not by default.

@llvmbot
Copy link
Member

llvmbot commented Dec 12, 2015

There are two major views of what an assembler should do:

  1. Understand, warn and correct user errors, but keeping the results as close to the original source as possible, as to strictly follow the principle of least surprise.

  2. Behave like a compiler, optimise user code and be "smart" about ISA tricks that can be applied to make the code faster, smaller, etc.

I'd personally find it surprising if the assembler behaved like #​2. IMO, it should take what I've written, and produce the exact bits implied by it. If there are cases where I could have written something else that would be faster, and the assembler knows about it, then telling me with a performance warning is ok, but swapping it out automatically without a flag to tell it to do so is not.

That being said, isn't the ARM ARM very clear what we must emit here?

When using the LDR pseudo-instruction:

  • If the value of expr can be loaded with a valid MOV or MVN instruction, the assembler uses that instruction.
  • If a valid MOV or MVN instruction cannot be used, or if the label_expr syntax is used, the assembler places the constant in a literal pool and generates a PC-relative LDR instruction that reads the constant from the literal pool.

Sounds to me that we must emit a mov in this case.

@nico
Copy link
Contributor Author

nico commented Dec 12, 2015

I agree with "being strict here is good long term" in theory. In practice, it means (as shown by the openssl commit I linked to above) that people are like "oh the assembler on iOS is strange, let me put in a few #if APPLE statements" (which don't work with clang on linux, etc). I understand that there are things that really shouldn't be supported, but converting ldr r4, =0 into a mov statement seems in line with other transformations that are already being done (say what happened in bug 25720).

@rengolin
Copy link
Member

That being said, isn't the ARM ARM very clear what we must emit here?

This is not the ARM ARM, but the ARMCC documentation, on the same section as ADRL is documented. We don't claim compatibility with ARM tools, more so than with GNU tools.

However, this case is simpler than ADRL, so I'm now ok with this one, since it is documented [1], and it's simple enough to not be a terrible hack (2 of the 4 points above).

[1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0801d/pge1425890292360.html

@rengolin
Copy link
Member

I agree with "being strict here is good long term" in theory. In practice,
it means (as shown by the openssl commit I linked to above) that people are
like "oh the assembler on iOS is strange, let me put in a few #if APPLE
statements" (which don't work with clang on linux, etc).

Philosophically, this is bad behaviour from the user, not the compiler. If the tools help educate the users to be better programmers, than I personally think it's a good tool. OTOH, if they allow users to write whatever and accept garbage (ultimately getting it wrong for someone), I think it's a bad tool.

Bad users might not like it, but I still think we have to keep our side of the deal.

I understand that
there are things that really shouldn't be supported, but converting ldr r4, =0 into a mov statement seems in line with other transformations that are
already being done (say what happened in bug 25720).

Jonathan's argument is stronger than yours: it is documented by ARM, and it is simple, and really GAS is doing what's in the document. That's a good technical argument that this small feature could very well be implemented.

Arguments like "GAS does that" or "it works in my program" are often proffered by developers that are used to this or that tool and are weak arguments.

I apologise for not checking the ARMCC docs, which have historically fallen into our grey area for implementation. The GNU docs are an even greyer area, especially for ARM, so whatever is in there needs to be even simpler or more important to get in.

I'll take a look at this translation, it should be reasonably simple, although slightly hacky, to get this in. At least now I have a document that will help me write tests. :)

@rengolin
Copy link
Member

rengolin commented Jan 7, 2016

I had a go at this today, and here are my findings...

The instruction sequence "ldr reg, =expr" matches the instruction LDRi12, which refers to atomic loads of 32-bit values (with modified immediate format). The expression is automatically converted to a temporary label during the match, and once MatchInstruction returns, it already exists.

In the ARM back-end, the sequence is to match instructions first, then pseudo instructions (including Asm-specific ones), and because there is a match to an instruction, when it gets to the validation step, the instruction is already way off base. Therefore, adding an AsmPseudo makes no difference.

A number of things could be done to implement this transformation:

  1. Move the label creation from the instruction match to the processInstruction, but that would mean refactor all label-based instructions to processInstruction. Adding new cases to that function is the opposite of what we should be doing, really.

  2. Add a proper instruction that matches the pseudo first. This could break code generation if the back-end recognises it as an instruction and emits this, we'd have to teach the MC layer how to emit it, essentially duplicating the logic on both the assembler and the code emitter.

  3. Adding an extra step on the instruction processing of the assembler (similar to AArch64 I hear), that would process all AsmPseudos before matching instructions. That is obviously a hack and may have unintended side effects that are hard to test (we certainly don't test all variations of all instructions). This seems like a risky move, one that would have to be done for a real bug, not a hack.

Overall, since the implementation is not as simple as I imagined, and the request can easily be solved in a way that works on both assemblers (ie. use a mov directly), I don't see an alternative other than ignore this request.

I won't mark as "won't fix", because I'm still ok with someone having a go at this on their own. If someone finds a better way that is simple and quick, I'm ok with the change, but I see no reason to continue trying myself.

@smithp35
Copy link
Collaborator

I've put in a fix that has been committed upstream.

ldr rt, =expr will be transformed into mov rt, expr when expr can be evaluated to a constant and the operands can be represented by an available mov instruction.

It does not support fixups so if expr can't be evaluated to a constant at the time ldr rt, =expr is encountered, but can be later in the file then a constant pool will still be generated.

http://reviews.llvm.org/D20153
http://reviews.llvm.org/D20154
http://reviews.llvm.org/D20155

committed with revisions:
r269352
r269353
r269354

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

6 participants