Skip to content

[AArch64] Assembly syntax for relocation specifier in data directives #132570

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MaskRay opened this issue Mar 22, 2025 · 3 comments
Open

[AArch64] Assembly syntax for relocation specifier in data directives #132570

MaskRay opened this issue Mar 22, 2025 · 3 comments
Labels
backend:AArch64 mc Machine (object) code

Comments

@MaskRay
Copy link
Member

MaskRay commented Mar 22, 2025

https://reviews.llvm.org/D81446 and #72584 , introduced the assembly syntax .word sym@plt-. and .word sym@gotpcrel for R_AARCH64_PLT32 and R_AARCH64_GOTPCREL32 relocations.
This syntax was chosen due to a quirk in LLVM's AsmParser, where @plt was universally accepted until my recent update (commit a067175) made it an opt-in parsing feature.

( https://reviews.llvm.org/D156505 introduced @AUTH(ib, 1234, addr))

FYI: For RISC-V, I am proposing to remove @plt @gotpcrel as there is no backward compatibility concern on -fexperimental-c++-abi-vtables generated assembly #132569

The use of @plt as a specifier applied to every subexpression is considered awkward.
The reliance on MCSymbolRefExpr::VariantKind for representing these relocations introduces complexity and inconsistencies within LLVM's internal representation.
As detailed in https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers#relocation-specifier-flavors , this approach highlights inelegance in how LLVM handles relocation specifiers.

As a more concrete example, https://reviews.llvm.org/D156505 , which introduced @AUTH(ib, 1234, addr), had to reject all kinds of invalid syntax
after parsing time (e.g. .quad _g9@AUTH(ia,42) - _g8@AUTH(ia,42)), which is not ideal.

A more standard syntax like .word :specifier: expr would be more desirable (another specifier cannot appear in a subexpression), but there is a parsing ambiguity: .word :plt:fun would be parsed as labels, and not as a relocation specifier.
(RISC-V could use .word %specifier(expr), which has no ambiguity. I have proposed #132569 )

This issue aims to record the current undesirable and inconsistent state. @smithp35

(AArch32's expr(specifier) is probably not a good choice either, as (specifier) could also appear at any subexpression, e.g. .long f(got)+3)

@MaskRay MaskRay added backend:AArch64 mc Machine (object) code labels Mar 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 22, 2025

@llvm/issue-subscribers-backend-aarch64

Author: Fangrui Song (MaskRay)

https://reviews.llvm.org/D81446 and https://github.com//pull/72584 , introduced the assembly syntax `.word sym@plt-.` and `.word sym@gotpcrel` for R_AARCH64_PLT32 and R_AARCH64_GOTPCREL32 relocations. This syntax was chosen due to an existing quirk in LLVM's AsmParser, where `@plt` was universally accepted until my recent update (commit a067175) made it an opt-in parsing feature.

( https://reviews.llvm.org/D156505 introduced @<!-- -->AUTH(ib, 1234, addr)

The use of @<!-- -->plt as a specifier applied to every subexpression is considered awkward.
The reliance on `MCSymbolRefExpr::VariantKind for representing these relocations introduces complexity and inconsistencies within LLVM's internal representation.
As detailed in https://maskray.me/blog/2025-03-16-relocation-generation-in-assemblers#relocation-specifier-flavors , this approach highlights inelegance in how LLVM handles relocation specifiers.

As a more concrete example, https://reviews.llvm.org/D156505 , which introduced @<!-- -->AUTH(ib, 1234, addr), had to reject all kinds of invalid syntax
after parsing time (e.g. .quad _g9@<!-- -->AUTH(ia,42) - _g8@<!-- -->AUTH(ia,42)), which is not ideal.

A more standard syntax like .word :specifier: expr would be more desirable (another specifier cannot appear in a subexpression), but there is a parsing ambiguity: .word :plt:fun would be parsed as labels, and not as a relocation specifier.
(RISC-V could use .word %specifier(expr), which has no ambiguity. I have proposed #132569 )

This issue aims to record the current undesirable and inconsistent state. @smithp35

@smithp35
Copy link
Collaborator

Apologies for the delay in responding. Just to confirm I've understood it, the Sysvabi64 documents the modifiers for instructions https://github.com/ARM-software/abi-aa/blob/main/sysvabi64/sysvabi64.rst#64assembler-language-addressing-mode-conventions which all use :modifier: syntax. AAIU these all prefix the symbol, for example adrp x0, :got: sym

What is wanted is some kind of syntax for data directives that isn't @? With something like :modifier(expr) a suggestion.

It is something we can look at for new modifiers that are proposed. I guess we're stuck with @ for the existing use cases for backwards compatibility.

@MaskRay
Copy link
Member Author

MaskRay commented Mar 26, 2025

Before https://reviews.llvm.org/D81446 in 2020, which added data directive sym@plt for the clang -fexperimental-relative-c++-abi-vtables feature, the sym@specifier syntax was either rejected outright or triggered assertion failures in AArch64ELFObjectWriter::getRelocType. In non-assertion builds, it might be simply accepted and ignored (for MCSymbolRefExpr::VariantKind constants that I am eliminating).

Indeed, we aim for proper assembly syntax in data directives, rather than having @ inadvertently accepted by the LLVM AsmParser (fixed by my a067175).

Because of the label ambiguity in .word: specifier: expr (accepted by existing assemblers, even unaware of the new syntax), the :specifier: expr syntax for code might not be repurposed.

.word %gotpcrel(sym), %gotpcrel(sym) and .word %auth(...)(expr) (AUTH needs extra arguments) are a choice.

The only allowed ELF @specifier are @got and @gotpcrel, dedicated to the clang -fexperimental-relative-c++-abi-vtables feature.
GNU assembler doesn't support @specifier. I believe there is no backward compatibility concern.

MaskRay added a commit to MaskRay/llvm-project that referenced this issue Apr 1, 2025
Following PR llvm#132569, which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(llvm#132570). A prefix notation might be a preferable future direction.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.
MaskRay added a commit to MaskRay/llvm-project that referenced this issue Apr 2, 2025
Following PR llvm#132569, which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(llvm#132570). A prefix notation might be a preferable future direction.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.
MaskRay added a commit to MaskRay/llvm-project that referenced this issue Apr 2, 2025
Following PR llvm#132569, which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(llvm#132570). A prefix notation might be a preferable future direction.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.
MaskRay added a commit that referenced this issue Apr 8, 2025
Following PR #132569 (RISC-V), which added `parseDataExpr` for parsing
expressions in data directives (e.g., `.word`), this PR migrates AArch64
`@plt`, `@gotpcrel`, and `@AUTH` from the `parsePrimaryExpr` workaround
to `parseDataExpr`. The goal is to align with the GNU assembler model,
where relocation specifiers apply to the entire operand rather than
individual terms, reducing complexity-especially evident in `@AUTH`
parsing.

Note: AArch64 ELF lacks an official syntax for data directives
(#132570). A prefix notation might be a preferable future direction.
I recommend `%specifier(expr)`.

AsmParser's `@specifier` parsing is suboptimal, necessitating lexer
workarounds. `@` might appear multiple times in an operand.
We should not use `@` beyond the existing AArch64 Mach-O instruction
operands.

In the test elf-reloc-ptrauth.s, many errors are now reported at parse
time.

Pull Request: #134202
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 mc Machine (object) code
Projects
None yet
Development

No branches or pull requests

3 participants