Skip to content

Commit c72c8ea

Browse files
committed
Standard Fixed-length Vector Calling Convention Variant
This proposal introduces a new variant of the calling convention specifically designed for fixed-length vectors. The primary aim is to facilitate passing fixed-length vectors through vector registers, derived from the standard vector calling convention with the same register conventions and argument passing/return value rules. Key features: - Introduce ABI_VLEN parameter denoting the width of a vector register, constrained to be no wider than the ISA's VLEN. Default recommended to 128 bits, with flexibility for 32 or 64 bits as permitted by the ISA. - Fixed-length vector argument passing rules based on size relative to ABI_VLEN: vectors smaller than ABI_VLEN pass in a single register, larger vectors pass in multiple registers following LMUL pattern (2, 4, 8). - Handling rules for structs/unions containing fixed-length vectors: - Structs with all fixed-length vector members follow vector tuple type rules if conforming to size constraints - Unions with fixed-length vectors follow integer calling convention - Pass struct as tuple-type in register only when vector arg reg is enough - Additional rules for: - Single fixed-length vector or fixed-length vector array with size 1 - Zero-length fixed-length arrays - Non-power-of-2 vectors - Vector types with unsupported element types - Name mangling specification for standard fixed-length vector calling convention and calling convention variants with ABI tag encoding - Example layouts for int32x4_t on different VLEN configurations
1 parent 4129bb3 commit c72c8ea

File tree

2 files changed

+230
-0
lines changed

2 files changed

+230
-0
lines changed

riscv-cc.adoc

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,208 @@ NOTE: `setjmp`/`longjmp` follow the standard calling convention, which clobbers
452452
all vector registers. Hence, the standard vector calling convention variant
453453
won't disrupt the `jmp_buf` ABI.
454454

455+
NOTE: Functions that use the standard vector calling convention
456+
variant follow an additional name mangling rule for {Cpp}.
457+
For more details, see <<Name Mangling for Standard Calling Convention Variant>>.
458+
459+
=== Standard Fixed-length Vector Calling Convention Variant
460+
461+
This section defines the calling convention variant for fixed-length vectors.
462+
The intention of this variant is to pass fixed-length vectors via the vector
463+
registers. For the definition of a fixed-length vector, see
464+
<<Fixed-Length Vector>>.
465+
466+
This variant is based on the standard vector calling convention variant:
467+
the register convention and the rules for passing arguments and return values
468+
are the same.
469+
470+
NOTE: The reason we define a separate calling convention variant is that we
471+
would like to define a flexible convention to utilize the variable length
472+
feature in the vector extension, also considering embedded vector extensions,
473+
such as `Zve32x`.
474+
475+
ABI_VLEN refers to the width of a vector register in the calling convention
476+
variant.
477+
478+
The ABI_VLEN must be no wider than the ISA's VLEN, meaning that the ISA may
479+
support wider vector registers than the ABI, but the ABI's VLEN cannot exceed
480+
the ISA's VLEN.
481+
482+
ABI_VLEN represents the width (in bits) of the vector registers available in the
483+
calling convention for fixed-length vectors. ABI_VLEN can vary from 32 bits
484+
(as in `Zve32x`) up to the maximum supported by the ISA. The flexibility of
485+
ABI_VLEN enables the convention to adapt to both low-end embedded systems and
486+
high-performance processors that utilize wider vector registers.
487+
488+
The ABI_VLEN is a parameter of this calling convention variant. It could be set
489+
by a command line option for the compiler or specified by a function
490+
attribute in the source code.
491+
492+
NOTE: We suggest the toolchain implementation set the default value of ABI_VLEN
493+
to 128, as it's the most common minimal requirement. However, it is not fixed
494+
to 128, since the ISA allows the VLEN to be only 32 bits or 64 bits. This
495+
also enables the utilization of the capacity of longer VLEN. Users can build
496+
with an optimized library with larger ABI_VLEN for better utilization of those
497+
cores with longer VLEN.
498+
499+
A fixed-length vector argument is passed in one vector argument register if the
500+
size of the vector is less than or equal to ABI_VLEN bit.
501+
502+
[NOTE]
503+
====
504+
Even in the absence of specific vector extension support for certain element
505+
types, such as `__bf16`, `_Float16`, `float`, or `double`, the standard
506+
fixed-length vector calling convention rules still apply. For example,
507+
even without the support of extensions like `Zvfbfmin`, `Zve32f`, or `Zve64d`,
508+
these element types will be passed according to the calling convention rules
509+
outlined here.
510+
511+
Additionally, data types such as `__int128_t`, which currently do not
512+
have direct support in any vector extension, will also follow these rules.
513+
This design ensures that the calling convention remains forward-compatible,
514+
minimizing the need for continuous adjustments as new extensions and data types
515+
are introduced in the future.
516+
517+
The consistency in applying these rules to unsupported element types guarantees
518+
a smooth transition when future vector extensions become available, allowing for
519+
seamless integration of new features without requiring significant changes to
520+
the calling convention.
521+
====
522+
523+
A fixed-length vector argument is passed in two vector argument registers,
524+
similar to vector data arguments with LMUL=2 and following the same register
525+
constraints, if the size of the vector is greater than ABI_VLEN bits and less
526+
than or equal to 2×ABI_VLEN bits.
527+
528+
A fixed-length vector argument is passed in four vector argument registers,
529+
similar to vector data arguments with LMUL=4 and following the same register
530+
constraints, if the size of the vector is greater than ABI_VLEN bits and less
531+
than or equal to 4×ABI_VLEN bits.
532+
533+
A fixed-length vector argument is passed in eight vector argument registers,
534+
similar to vector data arguments with LMUL=4 and following the same register
535+
constraints, if the size of the vector is greater than ABI_VLEN bits and less
536+
than or equal to 8×ABI_VLEN bits.
537+
538+
[NOTE]
539+
====
540+
Fixed-length vectors that are not a power-of-2 in size will be rounded up to
541+
the next power-of-2 length for the purpose of register allocation and handling.
542+
For instance, a vector type like `int32x3_t` (which contains three 32-bit
543+
integers) will be treated as an `int32x4_t` (a 128-bit vector, as LMUL=1 for
544+
ABI_VLEN=128) in the ABI, and passed accordingly. This ensures consistency in
545+
how vectors are handled and simplifies the process of argument passing.
546+
547+
Example: Consider an `int32x3_t` vector (three 32-bit integers):
548+
- The vector's total size is 96 bits, which is not a power of 2.
549+
- The ABI will round up the size to 128 bits (corresponding to `int32x4_t`),
550+
meaning the vector will be passed using one vector argument register when
551+
ABI_VLEN=128.
552+
553+
This rule applies to all non-power-of-2 fixed-length vectors, ensuring they
554+
are treated consistently across different ABI_VLEN settings.
555+
====
556+
557+
A fixed-length vector argument is passed by reference and is replaced in the
558+
argument list with the address if it is larger than 8×ABI_VLEN bit or if
559+
there is a shortage of vector argument registers.
560+
561+
A struct containing members with all fixed-length vectors will be passed in
562+
vector argument registers like a vector tuple type if all members have the
563+
same length, the length is less than or equal to 4×ABI_VLEN bit, and the size of
564+
the whole struct is less than or equal to 8×ABI_VLEN bit.
565+
If there are not enough vector argument registers to pass the entire struct,
566+
it will pass by reference and is replaced in the argument list with the address.
567+
Otherwise, it will use the rule defined in the hardware floating-point calling
568+
convention.
569+
570+
A struct containing just one fixed-length vector or a fixed-length vector
571+
array of length one, will be flattened as a single fixed-length vector argument
572+
if the size of the vector is less than or equal to 8×ABI_VLEN bit.
573+
574+
Structs with zero-length fixed-length arrays use the rule defined in the hardware
575+
floating-point calling convention, which means it won't consume vector argument
576+
register either in C or {Cpp}.
577+
578+
A struct containing just one fixed-length vector array is passed as though it
579+
were a vector tuple type if the size of the base element for the array is less than
580+
or equal to 8×ABI_VLEN bit, and the size of the array is less than 8×ABI_VLEN
581+
bits.
582+
If there are not enough vector argument registers to pass the entire struct,
583+
it will pass by reference and is replaced in the argument list with the address.
584+
Otherwise, it will use the rule defined in the hardware floating-point
585+
calling convention.
586+
587+
Unions with fixed-length vectors are always passed according to the integer
588+
calling convention.
589+
590+
The details of vector argument register rules are the same as the standard
591+
vector calling convention variant.
592+
593+
NOTE: Functions that use the standard fixed-length vector calling convention
594+
variant must be marked with STO_RISCV_VARIANT_CC. See <<Dynamic Linking>>
595+
for the meaning of STO_RISCV_VARIANT_CC.
596+
597+
NOTE: Functions that use the standard fixed-length vector calling convention
598+
variant follow an additional name mangling rule for {Cpp}.
599+
For more details, see <<Name Mangling for Standard Calling Convention Variant>>.
600+
601+
[NOTE]
602+
====
603+
When ABI_VLEN is smaller than the VLEN, the number of vector argument
604+
registers utilized remains unchanged. However, in such cases, values are only
605+
placed in a portion of these vector argument registers, corresponding to the
606+
size of ABI_VLEN. The remaining portion of the vector argument registers, which
607+
extends beyond the ABI_VLEN, will remain idle. This means that while the full
608+
capacity of the vector argument registers may not be used, the allocation of
609+
these registers do not change, ensuring consistency in register usage regardless
610+
of the ABI_VLEN to VLEN ratio.
611+
612+
Example: With ABI_VLEN at 32 bits and VLEN at 128 bits, consider passing an
613+
`int32x4_t` parameter (four 32-bit integers).
614+
615+
Allocation: Four vector argument registers are allocated for
616+
`int32x4_t`, based on LMUL=4.
617+
618+
Utilization: All four integers are placed in the first vector register,
619+
utilizing its full 128-bit capacity (VLEN), despite ABI_VLEN being 32 bits.
620+
621+
Remaining Registers: The other three allocated registers remain unused and idle.
622+
623+
.int32x4_t layout on different VLEN with ABI_VLEN at 32 bits:
624+
[cols="2,3,3,3,3"]
625+
[width=100%]
626+
|===
627+
| VLEN | v8 | v9 | v10 | v11
628+
629+
| 32 | a | b | c | d
630+
| 64 | a, b | c, d | -, - | -, -
631+
| 128 | a, b, c, d | -, -, -, - | -, -, -, - | -, -, -, -
632+
| 256 | a, b, c, d, -, -, -, - | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, -
633+
|===
634+
635+
.int64x8_t layout on different VLEN with ABI_VLEN at 128 bits:
636+
[cols="2,3,3,3,3"]
637+
[width=100%]
638+
|===
639+
| VLEN | v8 | v9 | v10 | v11
640+
641+
| 128 | a, b | c, d | e, f | g, h
642+
| 256 | a, b, c, d | e, f, g, h | -, -, -, - | -, -, -, -
643+
| 512 | a, b, c, d, e, f, g, h | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, - | -, -, -, -, -, -, -, -
644+
|===
645+
646+
`-` means that part are not used, and the value can be anything.
647+
648+
====
649+
650+
NOTE: In a single compilation unit, different functions may use different
651+
ABI_VLEN values. This means that ABI_VLEN is not uniform across the entire unit,
652+
allowing for function-specific optimization. However, this necessitates that
653+
users ensure consistency in ABI_VLEN between calling and called functions. It
654+
is the user's responsibility to verify that the ABI_VLEN matches on both sides
655+
of a function call to ensure correct operation and data handling.
656+
455657
=== ILP32E Calling Convention
456658

457659
IMPORTANT: RV32E is not a ratified base ISA and so we cannot guarantee the

riscv-elf.adoc

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -204,6 +204,34 @@ See the "Type encodings" section in _Itanium {Cpp} ABI_
204204
for more detail on how to mangle types. Note that `__bf16` is mangled in the
205205
same way as `std::bfloat16_t`.
206206

207+
=== Name Mangling for Standard Calling Convention Variant
208+
209+
Functions using the standard calling convention variant have to append extra ABI tag to
210+
the function name mangling, the rule is the same as the "ABI tags" section in
211+
_Itanium {Cpp} ABI_.
212+
213+
.ABI Tag name for calling convention variants
214+
[cols="5,2"]
215+
[width=80%]
216+
|===
217+
| Name | ABI tag name
218+
219+
| Standard fixed-length vector calling convention variant | riscv_vls_cc_<ABI_VLEN>
220+
|===
221+
222+
223+
For example:
224+
[,c]
225+
----
226+
__attribute__((riscv_vls_cc(128))) void foo();
227+
----
228+
229+
is mangled as
230+
[,c]
231+
----
232+
_Z3fooB12riscv_vls_cc_128v
233+
----
234+
207235
=== Name Mangling for Vector Data Types, Vector Mask Types and Vector Tuple Types.
208236

209237
The vector data types and vector mask types, as defined in the section

0 commit comments

Comments
 (0)