Skip to content

Conversation

@kito-cheng
Copy link
Collaborator

This proposal outlines a new variant of the calling convention specifically designed for fixed-length vectors. The primary aim of this variant is to facilitate the passing of fixed-length vectors through vector registers. This approach is derived from the standard vector calling convention, it uses the same register conventions and argument passing and return value rules.

A key aspect of this variant is the introduction of ABI_VLEN, which denotes the width of a vector register within this convention. The ABI_VLEN is constrained to be no wider than the ISA's VLEN (Vector Length), ensuring compatibility while allowing for flexibility in different implementations. This parameter can be configured via compiler command line options or through function attributes in source code.

The document recommends setting the default ABI_VLEN to 128 bits, acknowledging it as a common minimal requirement while allowing the flexibility for lower VLEN (32 or 64 bits) as permitted by the ISA. This flexibility is crucial for optimizing the utilization of longer VLENs in various cores.

The proposal specifies how fixed-length vector arguments are passed based on their size relative to ABI_VLEN. Vectors smaller than ABI_VLEN are passed in a single vector argument register, while larger vectors are passed in multiple registers, following the LMUL (Length Multiplier) pattern of 2, 4, or 8, depending on their size.

Additionally, the proposal addresses the handling of structs and unions containing fixed-length vectors. Structs with members that are all fixed-length vectors follow the vector tuple type rules if they conform to size constraints. In contrast, unions with fixed-length vectors adhere to the integer calling convention.

@kito-cheng
Copy link
Collaborator Author

@kito-cheng kito-cheng requested a review from lhtin January 4, 2024 09:57
Floating-point registers fs0-fs11 shall be preserved across procedure calls,
provided they hold values no more than ABI_FLEN bits wide.

=== Standard Fixed-length Vector Calling Convention Variant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variant itself seems fine, modulo nits, but how are we planning to enable it?

If it's automatically used by -march=rva23 -mabi=ilp32d that will create major compatibility issues for binary distributions that use a fixed ABI and allow mixing packages at different architecture levels (either as an explicit user action, or as an implementation detail when rebuilding the distribution to change the architecture requirement).

If a new -mabi= value is required to enable use of the variant, it will be usable on closed systems where all packages are built at once, but not on binary distributions, since there is no expectation that binary code built with different -mabi= options is interoperable at all. This will include Debian and Alpine and might include Android and Fedora if their ABIs are finalized prior to the acceptance of this PR.

If it's enabled on a per-function basis using an attribute, or automatically for functions not visible across DSO boundaries, then it's effectively part of the definition of the attribute or a compiler implementation detail and may belong in riscv-c-api-doc or gccint, not here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My expectation is that should be enabled by per-function basis by attribute, and I think that should have a riscv-c-api-doc PR for that, will send that in the next few days.

@kito-cheng kito-cheng force-pushed the fixed-length-vector-cc branch from b9f0bc2 to 2386a73 Compare January 26, 2024 10:02
@kito-cheng
Copy link
Collaborator Author

ChangeLog:

  • Reorder rule.
  • Pass struct as tuple-type in register only when vector arg reg is
    enough, otherwise passed in reference.
  • Add NOTE for describe what if ABI_VLEN is smaller than VLEN, also
    come with an example.
  • Add NOTE for describe different functions may use different
    ABI_VLEN values.

@kito-cheng
Copy link
Collaborator Author

ChangeLog:

  • Add rule for single fixed-length vector or fixed-length vector array with size 1.
  • Add rule for zero-length fixed-length arrays.
  • Add explicitly rule for fixed-length vector struct as vector tuple type: pass by ref if no enough arg register.

kito-cheng added a commit to kito-cheng/riscv-c-api-doc that referenced this pull request Feb 2, 2024
…n variant

Fixed-length vector are passed via general purposed register or memory
within current ABI design, we proposed a standard fixed-length vector calling
convention variant for passing the fixed-length vector via vector register.

This is the syntax part in the proposal, further detail for that calling
convention variant see riscv-non-isa/riscv-elf-psabi-doc#418
@kito-cheng
Copy link
Collaborator Author

Proposal for function attribute syntax: riscv-non-isa/riscv-c-api-doc#68

@kito-cheng kito-cheng force-pushed the fixed-length-vector-cc branch from c84ebb1 to 7e9d68c Compare September 5, 2024 09:42
4vtomat added a commit to llvm/llvm-project that referenced this pull request Mar 3, 2025
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.

Here is an example of VLS argument passing:
Non-VLS call:
```
  void original_call(__attribute__((vector_size(16))) int arg) {}
=>
  define void @original_call(i128 noundef %arg) {
  entry:
    ...
    ret void
  }
```
VLS call:
```
  void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
  define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
  entry:
    ...
    ret void
  }
}
```

The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.

PsABI PR: riscv-non-isa/riscv-elf-psabi-doc#418
C-API PR: riscv-non-isa/riscv-c-api-doc#68
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 3, 2025
This patch adds a function attribute `riscv_vls_cc` for RISCV VLS
calling
convention which takes 0 or 1 argument, the argument is the `ABI_VLEN`
which is the `VLEN` for passing the fixed-vector arguments, it wraps the
argument as a scalable vector(VLA) using the `ABI_VLEN` and uses the
corresponding mechanism to handle it. The range of `ABI_VLEN` is [32,
65536],
if not specified, the default value is 128.

Here is an example of VLS argument passing:
Non-VLS call:
```
  void original_call(__attribute__((vector_size(16))) int arg) {}
=>
  define void @original_call(i128 noundef %arg) {
  entry:
    ...
    ret void
  }
```
VLS call:
```
  void __attribute__((riscv_vls_cc(256))) vls_call(__attribute__((vector_size(16))) int arg) {}
=>
  define riscv_vls_cc void @vls_call(<vscale x 1 x i32> %arg) {
  entry:
    ...
    ret void
  }
}
```

The first Non-VLS call passes generic vector argument of 16 bytes by
flattened integer.
On the contrary, the VLS call uses `ABI_VLEN=256` which wraps the
vector to <vscale x 1 x i32> where the number of scalable vector
elements
is calaulated by: `ORIG_ELTS * RVV_BITS_PER_BLOCK / ABI_VLEN`.
Note: ORIG_ELTS = Vector Size / Type Size = 128 / 32 = 4.

PsABI PR: riscv-non-isa/riscv-elf-psabi-doc#418
C-API PR: riscv-non-isa/riscv-c-api-doc#68
@kito-cheng
Copy link
Collaborator Author

Changes:

  • Rebase
  • Fix typos
  • Drop non-mandatory note about default ABI_VLEN setting

@kito-cheng
Copy link
Collaborator Author

LLVM has supported VLS CC llvm/llvm-project@c804e86

@kito-cheng
Copy link
Collaborator Author

Ping, LLVM part has landed a while, does any one from LLVM community is interested to review this again?

@kito-cheng kito-cheng force-pushed the fixed-length-vector-cc branch from 02af284 to e6f1a34 Compare June 17, 2025 06:20
@markschimmel
Copy link

If you change anything regarding structs then how can it be specified per function? Perhaps leave anything related to structs out of the proposal.

@topperc
Copy link
Contributor

topperc commented Jul 18, 2025

If you change anything regarding structs then how can it be specified per function? Perhaps leave anything related to structs out of the proposal.

It only changes how structs are passed and returned for that function. The attribute must be on the prototype so the caller will know the ABI for the call. What's your concern?

@kito-cheng
Copy link
Collaborator Author

GCC PoC is here, gonna to summit to upstream for review in next few day:
https://github.com/kito-cheng/gcc/tree/kitoc/vls-cc

@kito-cheng
Copy link
Collaborator Author

Proposed patchset for upstream GCC:
https://patchwork.sourceware.org/project/gcc/list/?series=52829

@xypron
Copy link

xypron commented Oct 9, 2025

This suggestion seems to leave a lot of performance on the table for systems that have vlen >= ABI_VLEN.

A vlen=512 system which has a result r1,r2,r3,r4,r5,r6,r7,r8 in register v1 when calling would have to rearrange the data to

v1: r1, r2, 0, 0, 0, 0, 0
v2: r3, r4, 0, 0, 0, 0, 0
v3: r5, r6, 0, 0, 0, 0, 0
v4: r7, r8, 0, 0, 0, 0, 0

before calling a function and in the called function rearrange it to

v1: r1,r2,r3,r4,r5,r6,r7,r8

These rearrangement should be avoided.

Please, leave the data in the compact form when calling into other functions.

@kito-cheng kito-cheng force-pushed the fixed-length-vector-cc branch from 7dbeb4b to 664c5d5 Compare October 13, 2025 06:36
====
When ABI_VLEN is smaller than the VLEN, the number of vector argument
registers utilized remains unchanged. However, in such cases, values are only
placed in a portion of these vector argument registers, corresponding to the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you would only use a portion of the vector registers.
This will require rearranging data by the caller and the callee.
Instead you could leave vector registers unused which would be much more efficient.

I don't think that we should keep the current design.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ABI_VLEN should simply provide the maximum of bits that can be exchanged between the caller and the callee via registers.

Arguments smaller than or equal to ABI_VLEN should be passed by up to e.g. 4 registers.
Arguments larger than ABI_VLEN should be passed by stack.

But data in vector registers should always be placed as compact as possible.

Copy link
Collaborator Author

@kito-cheng kito-cheng Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why that will trigger data rearranging? Do you mind give an example for that?

Give some practical example, so that we can discussed with some concrete case :

typedef signed long long __attribute__( ( vector_size( 64 ) ) ) int64x8_t;

__attribute__((riscv_vls_cc(128)))
int64x8_t foo (int64x8_t a, int64x8_t b);
// Return in v8-v11 since 512 bits use LMUL=4 and will occupy 4 registers
// Pass a in v8-v11 since 512 bits use LMUL=4 and will occupy 4 registers
// Pass b in v12-v15 since 512 bits use LMUL=4 and will occupy 4 registers
// Compile with -march=rv64gcv_zvl512b
void bar()
{
  // Assume a assigned to v8
  int64x8_t a = {1, 2, 3, 4, 5, 6, 7, 8};
  // Assume b assigned to v9
  int64x8_t b = {1, 2, 3, 4, 5, 6, 7, 8};
  // Pass a to foo, although it occupy 4 register according the ABI_VLEN
  // But we can still pass that without a without rearranging
  // So v9-v11 is leaving unset
  // Move b to v12 due to ABI requirement, this can be optimized
  // by register allocator in general
  // v13-v15 is leaving unset
  a = foo (a, b);
}
// Compile with -march=rv64gcv (VLEN=128)
void bar()
{
  // Assume a assigned to v8-v11
  int64x8_t a = {1, 2, 3, 4, 5, 6, 7, 8};
  // Assume b assigned to v12-v15
  int64x8_t b = {1, 2, 3, 4, 5, 6, 7, 8};
  // Pass a to foo in v8-v11
  // Pass b to foo in v12-v15
  a = foo (a, b);
}

In foo, that will use operate vector operation with VL=4 and LMUL=4, so that could ensure that got same result on different VLEN machine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the assembler code everything looks fine.

"the number of vector argument registers utilized remains unchanged" is a bit misleading as in the table below you essentially indicate that some registers may remain unused "-,-,-,-," depending on the machine size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think use "occupancy" would be better?

e.g. "the number of vector argument registers occupancy remains unchanged"

hubot pushed a commit to gcc-mirror/gcc that referenced this pull request Oct 27, 2025
…iant

This patch implements the standard fixed-length vector calling convention
variant as specified in the RISC-V ELF psABI document. The implementation
introduces ABI_VLEN to serve as the minimal VLEN for fixed-length vectors.

For example, int32x8_t is a 256-bit vector type. If ABI_VLEN is 128, it
will be passed in two vector registers as LMUL 2. If ABI_VLEN is larger
than 256, it will be passed in one vector register as LMUL 1.

This differs from the minimal VLEN (defined by ZVL*B extension) to ensure
ABI stability when the program compiles with different VLEN/ZVL*B settings.

Change since v1:
- Adding check_only parameter for several functions to make sure we
  won't emit warnings during checking function ABI.

Ref: riscv-non-isa/riscv-elf-psabi-doc#418

gcc/ChangeLog:

	* config/riscv/riscv-protos.h (riscv_init_cumulative_args): Add
	bool parameter.
	* config/riscv/riscv.h (enum riscv_cc): Add RISCV_CC_VLS_V_32,
	RISCV_CC_VLS_V_64, RISCV_CC_VLS_V_128, RISCV_CC_VLS_V_256,
	RISCV_CC_VLS_V_512, RISCV_CC_VLS_V_1024, RISCV_CC_VLS_V_2048,
	RISCV_CC_VLS_V_4096, RISCV_CC_VLS_V_8192, RISCV_CC_VLS_V_16384.
	(CUMULATIVE_ARGS): Add abi_vlen field.
	* config/riscv/riscv.cc (riscv_handle_rvv_vls_cc_attribute): New
	function.
	(riscv_gnu_attributes): Add vls_cc attribute entry.
	(riscv_attributes): Add riscv_vls_cc attribute entry.
	(riscv_flatten_aggregate_field): Add vls_p and abi_vlen parameters
	to handle VLS vector types.
	(riscv_flatten_aggregate_argument): Update call to
	riscv_flatten_aggregate_field.
	(riscv_get_vector_arg): Add vls_p parameter for VLS handling.
	(riscv_vls_cc_p): New function.
	(riscv_get_cc_abi_vlen): New function.
	(riscv_valid_abi_vlen_vls_cc_p): New function.
	(riscv_get_riscv_cc_by_abi_vlen): New function.
	(riscv_get_vls_container_type): New function.
	(riscv_pass_vls_in_vr): New function.
	(riscv_pass_aggregate_in_vr): New function.
	(riscv_get_arg_info): Add VLS calling convention handling.
	(riscv_function_arg_advance): Update for VLS calling convention.
	(riscv_return_in_memory): Add fntype parameter and initialize
	cumulative args properly.
	(riscv_v_abi): Add abi parameter.
	(riscv_get_vls_cc_attr): New function.
	(riscv_vls_cc_function_abi): New function.
	(riscv_fntype_abi): Add VLS calling convention detection.
	(riscv_asm_output_variant_cc): Update for VLS calling convention.
@kito-cheng
Copy link
Collaborator Author

Gonna to merge this since we have this on both LLVM and GCC, also reviewed for a while, although there is few room to improve on wording, but that relative minor and can be update later.

This proposal introduces a new variant of the calling convention specifically
designed for fixed-length vectors. The primary aim is to facilitate passing
fixed-length vectors through vector registers, derived from the standard
vector calling convention with the same register conventions and argument
passing/return value rules.

Key features:

- Introduce ABI_VLEN parameter denoting the width of a vector register,
  constrained to be no wider than the ISA's VLEN. Default recommended
  to 128 bits, with flexibility for 32 or 64 bits as permitted by the ISA.

- Fixed-length vector argument passing rules based on size relative to
  ABI_VLEN: vectors smaller than ABI_VLEN pass in a single register,
  larger vectors pass in multiple registers following LMUL pattern (2, 4, 8).

- Handling rules for structs/unions containing fixed-length vectors:
  - Structs with all fixed-length vector members follow vector tuple type
    rules if conforming to size constraints
  - Unions with fixed-length vectors follow integer calling convention
  - Pass struct as tuple-type in register only when vector arg reg is enough

- Additional rules for:
  - Single fixed-length vector or fixed-length vector array with size 1
  - Zero-length fixed-length arrays
  - Non-power-of-2 vectors
  - Vector types with unsupported element types

- Name mangling specification for standard fixed-length vector calling
  convention and calling convention variants with ABI tag encoding

- Example layouts for int32x4_t on different VLEN configurations
@kito-cheng kito-cheng force-pushed the fixed-length-vector-cc branch from 834023c to c72c8ea Compare November 26, 2025 10:03
@kito-cheng kito-cheng merged commit b17fc7b into master Nov 26, 2025
2 checks passed
@kito-cheng kito-cheng deleted the fixed-length-vector-cc branch November 26, 2025 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants