Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use new long-distance call. #1570

cfallin · 2020-04-21T19:32:39Z

Previously, every call was lowered on AArch64 to a call instruction, which
takes a signed 26-bit PC-relative offset. Including the 2-bit left shift, this
gives a range of +/- 128 MB. Longer-distance offsets would cause an impossible
relocation record to be emitted (or rather, a record that a more sophisticated
linker would fix up by inserting a shim/veneer).

This commit adds a notion of "relocation distance" in the MachInst backends,
and provides this information for every call target and symbol reference. The
intent is that backends on architectures like AArch64, where there are different
offset sizes / addressing strategies to choose from, can either emit a regular
call or a load-64-bit-constant / call-indirect sequence, as necessary. This
avoids the need to implement complex linking behavior.

The MachInst driver code provides this information based on the "colocated" bit
in the CLIF symbol references, which appears to have been designed for this
purpose, or at least a similar one. Combined with the use_colocated_libcalls
setting, this allows client code to ensure that library calls can link to
library code at any location in the address space.

Separately, the simplejit example did not handle Arm64Call; rather than doing
so, it appears all that is necessary to get its tests to pass is to set the
use_colocated_libcalls flag to false, to make use of the above change. This
fixes the libcall_function unit-test in this crate.

github-actions · 2020-04-21T19:49:44Z

Subscribe to Label Action

cc @bnjbvr

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:module"

Thus the following users have been cc'd because of the following labels:

bnjbvr: cranelift

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

sunfishcode · 2020-04-21T19:51:34Z

At a quick glance, this does indeed look consistent with the intent of the colocated flag.

bjorn3 · 2020-04-21T20:40:33Z

As far as I know the colocated flag is meant for functions that will be at a fixed offset from the current function, so they could use PCREL relocations, rather than using a GOT. I don't think it necessarily means that it is "near". For example a fully statically linked binary could always use colocated even when it is 4GB big.

cfallin · 2020-04-21T20:57:56Z

the colocated flag is meant for functions that will be at a fixed offset

Hmm -- given that, it seems there isn't a CLIF-level notion of "in the same module"? Perhaps we can add a new bit to calls and symbol_values -- I'm not sure what to call it, but we somehow need to communicate the notion of "call into another function in the same Wasm module, which due to Wasm bytecode size limits should be in range of most RISC ISAs' call instructions" (EDIT: for the Wasm use-case, at least; for e.g. AOT-into-object-file use-cases, we just emit the right relocs and everything works).

It seems the effect of colocated is what we want, at least going off of the description: non-colocated implies indirection through a table, while colocated implies direct PC-rel reference.

Alternately, we could consider just doing the right thing and implementing the veneer-insertion linker behavior; that that puts a larger burden on the client, and also breaks invariants around "this blob of machine code from the backend is a fixed-size blob that will never need to be extended with thunks at link time".

sunfishcode · 2020-04-21T21:05:14Z

@bjorn3 Ah, that's true. That's what I get for jumping in without full context here :-}.

This change adds SourceLoc information per instruction in a `VCode<Inst>` container, and keeps this information up-to-date across register allocation and branch reordering. The information is initially collected during instruction lowering, eventually collected on the MachSection, and finally provided to the environment that wraps the codegen crate for wasmtime. This PR is based on top of bytecodealliance#1570 and bytecodealliance#1571 (part of a series fixing tests). This PR depends on wasmtime/regalloc.rs#50, a change to the register allocator to provide instruction-granularity info on the rewritten instruction stream (rather than block-granularity). With the prior PRs applied as well, quite a few more unit tests pass; the exclusion list in bytecodealliance#1526 should be updated if this PR lands first.

bnjbvr

I agree with the definition of colocated, at least to my understanding (I never had to interact with it in the past).

So if I understand correctly, the Cranelift's users may still say that all the function calls call to colocated functions, as long as they insert the veneers, right? (If so, this should work as is in Spidermonkey)

It would be nice to have a way to signal the users that a call actually require a veneer; but this is probably a job for the object/simplejit et al. crates, not for Cranelift itself.

(/me starts to think about reordering functions within sections so as to minimize the need for veneers)

LGTM in any case, thanks!

cranelift/codegen/src/isa/aarch64/lower.rs

cranelift/codegen/src/machinst/lower.rs

cfallin · 2020-04-22T15:57:44Z

@bnjbvr re:

I agree with the definition of colocated, at least to my understanding

To make sure I understand -- you're agreeing with the initial assertion that colocated seems in practice to mean "reference inside module that can use direct PC-rel references" vs. "reference to another module that needs to go through some sort of indirection / support arbitrary addresses"? Or the later definition clarified by @bjorn3 that it is just "constant PC-rel offset" (but arbitrarily far away)?

I think the basic question is whether we can nudge the definition toward the former -- more of a "same module" vs. "different module" bit, from which we can infer approximate relocation distance (given module size limits), or whether we need another bit / attribute for this.

Thoughts?

bnjbvr · 2020-04-22T16:01:35Z

The former, precisely (direct PCRel call vs load from table + indirect call); at least this seems to be the way we set it in Spidermonkey. We could imagine having a different flag for the latter, but I think that's out of scope for this PR.

cfallin · 2020-04-22T21:59:32Z

Updated -- just want to make sure we're all OK with the refined meaning of colocated here -- @sunfishcode, OK to do this (with extra doc-comment note on the bool flag definitions) or would you prefer something else?

This change adds SourceLoc information per instruction in a `VCode<Inst>` container, and keeps this information up-to-date across register allocation and branch reordering. The information is initially collected during instruction lowering, eventually collected on the MachSection, and finally provided to the environment that wraps the codegen crate for wasmtime. This PR is based on top of bytecodealliance#1570 and bytecodealliance#1571 (part of a series fixing tests). This PR depends on wasmtime/regalloc.rs#50, a change to the register allocator to provide instruction-granularity info on the rewritten instruction stream (rather than block-granularity). With the prior PRs applied as well, quite a few more unit tests pass; the exclusion list in bytecodealliance#1526 should be updated if this PR lands first.

cfallin · 2020-04-28T03:59:29Z

@sunfishcode -- friendly ping, could you verify whether you're OK with this interpretation of colocated?

cfallin · 2020-05-05T00:31:05Z

Rebased and added a more detailed doc comment to the colocated flag, as per a conversation with @sunfishcode just now. Will merge once the tests are green. In the longer term, we'll need to think a bit more about how to support different code models, beyond the simple RelocDistance here; I'll open an issue for that.

bnjbvr · 2020-05-05T08:28:52Z

as per a conversation with @sunfishcode just now.

Where did this conversation happen? I can't find any trace in all the public channels where I'm hanging out. Could the contents of this discussion be summarized somewhere? @sunfishcode @cfallin

cranelift/codegen/src/ir/globalvalue.rs

cfallin · 2020-05-05T14:29:02Z

Sorry, this was from a 1:1 IM conversation on Zulip, after I had pinged about the above; I should've asked for a comment here for the record!

Here's a transcript:

@sunfishcode: I must apologize, I don't have enough context to answer this, nor time at this moment to context switch and page it in.
@sunfishcode: Potentially related is the concept of "code models", https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#index-mcmodel_003dsmall
@sunfishcode: although judging by the comments, what Cranelift is doing here is different from any of GCC's defined code models
@sunfishcode: but eventually, I expect Cranelift will need to evolve in the direction of supporting the standard code models, possibly in addition to its own custom ones
@cfallin: OK, no worries. I had been waiting in case you had more to say, but perhaps it's best to merge it with Ben's existing approval, then address the code-model issue later (I can open an issue for it)?
@sunfishcode: Yeah. Ok, what if you added something like this to the comment on the colocated flag:
@sunfishcode: "The exact distance depends on the code model in use. Currently on AArch64 Cranelift uses a custom code model supporting up to +/- byte displacements"
@sunfishcode: or so

…oc, and fix simplejit to use it. Previously, every call was lowered on AArch64 to a `call` instruction, which takes a signed 26-bit PC-relative offset. Including the 2-bit left shift, this gives a range of +/- 128 MB. Longer-distance offsets would cause an impossible relocation record to be emitted (or rather, a record that a more sophisticated linker would fix up by inserting a shim/veneer). This commit adds a notion of "relocation distance" in the MachInst backends, and provides this information for every call target and symbol reference. The intent is that backends on architectures like AArch64, where there are different offset sizes / addressing strategies to choose from, can either emit a regular call or a load-64-bit-constant / call-indirect sequence, as necessary. This avoids the need to implement complex linking behavior. The MachInst driver code provides this information based on the "colocated" bit in the CLIF symbol references, which appears to have been designed for this purpose, or at least a similar one. Combined with the `use_colocated_libcalls` setting, this allows client code to ensure that library calls can link to library code at any location in the address space. Separately, the `simplejit` example did not handle `Arm64Call`; rather than doing so, it appears all that is necessary to get its tests to pass is to set the `use_colocated_libcalls` flag to false, to make use of the above change. This fixes the `libcall_function` unit-test in this crate.

cfallin added the cranelift:area:aarch64 Issues related to AArch64 backend. label Apr 21, 2020

cfallin requested review from bnjbvr and julian-seward1 April 21, 2020 19:32

cfallin force-pushed the fix-long-range-aarch64-call branch 2 times, most recently from 4d78b40 to 4d721d0 Compare April 21, 2020 19:40

github-actions bot added cranelift Issues related to the Cranelift code generator cranelift:module labels Apr 21, 2020

cfallin mentioned this pull request Apr 22, 2020

MachInst backend: pass through SourceLoc information. #1575

Merged

bnjbvr approved these changes Apr 22, 2020

View reviewed changes

cranelift/codegen/src/isa/aarch64/lower.rs Outdated Show resolved Hide resolved

cranelift/codegen/src/machinst/lower.rs Outdated Show resolved Hide resolved

cranelift/codegen/src/machinst/lower.rs Show resolved Hide resolved

cfallin force-pushed the fix-long-range-aarch64-call branch from cca936d to fe35934 Compare April 23, 2020 20:23

cfallin force-pushed the fix-long-range-aarch64-call branch from fe35934 to a369b7b Compare May 5, 2020 00:29

cfallin mentioned this pull request May 5, 2020

Cranelift: support different code models (i.e., relocation strategies and displacement limits) #1657

Open

cfallin force-pushed the fix-long-range-aarch64-call branch from a369b7b to e06a50f Compare May 5, 2020 00:47

bnjbvr reviewed May 5, 2020

View reviewed changes

cranelift/codegen/src/ir/globalvalue.rs Outdated Show resolved Hide resolved

cfallin force-pushed the fix-long-range-aarch64-call branch from e06a50f to 692f9e4 Compare May 5, 2020 14:23

cfallin force-pushed the fix-long-range-aarch64-call branch from 692f9e4 to e39b4ab Compare May 5, 2020 16:55

cfallin merged commit 59039df into bytecodealliance:master May 5, 2020

cfallin deleted the fix-long-range-aarch64-call branch May 6, 2020 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use new long-distance call. #1570

Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use new long-distance call. #1570

Uh oh!

cfallin commented Apr 21, 2020

Uh oh!

github-actions bot commented Apr 21, 2020

Uh oh!

sunfishcode commented Apr 21, 2020

Uh oh!

bjorn3 commented Apr 21, 2020

Uh oh!

cfallin commented Apr 21, 2020 •

edited

Loading

Uh oh!

sunfishcode commented Apr 21, 2020

Uh oh!

bnjbvr left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cfallin commented Apr 22, 2020

Uh oh!

bnjbvr commented Apr 22, 2020

Uh oh!

cfallin commented Apr 22, 2020

Uh oh!

cfallin commented Apr 28, 2020

Uh oh!

cfallin commented May 5, 2020

Uh oh!

bnjbvr commented May 5, 2020

Uh oh!

Uh oh!

cfallin commented May 5, 2020

Uh oh!

Uh oh!

Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use new long-distance call. #1570

Fix long-range (non-colocated) aarch64 calls to not use Arm64Call reloc, and fix simplejit to use new long-distance call. #1570

Uh oh!

Conversation

cfallin commented Apr 21, 2020

Uh oh!

github-actions bot commented Apr 21, 2020

Subscribe to Label Action

Uh oh!

sunfishcode commented Apr 21, 2020

Uh oh!

bjorn3 commented Apr 21, 2020

Uh oh!

cfallin commented Apr 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunfishcode commented Apr 21, 2020

Uh oh!

bnjbvr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cfallin commented Apr 22, 2020

Uh oh!

bnjbvr commented Apr 22, 2020

Uh oh!

cfallin commented Apr 22, 2020

Uh oh!

cfallin commented Apr 28, 2020

Uh oh!

cfallin commented May 5, 2020

Uh oh!

bnjbvr commented May 5, 2020

Uh oh!

Uh oh!

cfallin commented May 5, 2020

Uh oh!

Uh oh!

cfallin commented Apr 21, 2020 •

edited

Loading