Skip to content

Commit c4388fa

Browse files
[fixup] Misc small fixes
1 parent 73a5b75 commit c4388fa

File tree

3 files changed

+26
-20
lines changed

3 files changed

+26
-20
lines changed

main/acle.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2366,7 +2366,7 @@ be found in [[BA]](#BA).
23662366
| [`__ARM_FEATURE_IDIV`](#hardware-integer-divide) | Hardware Integer Divide | 1 |
23672367
| [`__ARM_FEATURE_JCVT`](#javascript-floating-point-conversion) | Javascript conversion (ARMv8.3-A) | 1 |
23682368
| [`__ARM_FEATURE_LDREX`](#ldrexstrex) *(Deprecated)* | Load/store exclusive instructions | 0x0F |
2369-
| [`__ARM_FEATURE_LUT`](#lookup-table-extensions) | Lookup table extensions | 1 |
2369+
| [`__ARM_FEATURE_LUT`](#lookup-table-extensions) | Lookup table extensions (FEAT_LUT) | 1 |
23702370
| [`__ARM_FEATURE_MATMUL_INT8`](#availability-of-armv8.6-a-integer-matrix-multiply-intrinsics) | Integer Matrix Multiply extension (Armv8.6-A, optional Armv8.2-A, Armv8.3-A, Armv8.4-A, Armv8.5-A) | 1 |
23712371
| [`__ARM_FEATURE_MEMORY_TAGGING`](#memory-tagging) | Memory Tagging (Armv8.5-A) | 1 |
23722372
| [`__ARM_FEATURE_MOPS`](#memcpy-family-of-memory-operations-standarization-instructions---mops) | `memcpy`, `memset`, and `memmove` family of operations standardization instructions | 1 |
@@ -2391,7 +2391,7 @@ be found in [[BA]](#BA).
23912391
| [`__ARM_FEATURE_SME_F64F64`](#double-precision-floating-point-outer-product-intrinsics) | Double precision floating-point outer product intrinsics (FEAT_SME_F64F64) | 1 |
23922392
| [`__ARM_FEATURE_SME_I16I64`](#16-bit-to-64-bit-integer-widening-outer-product-intrinsics) | 16-bit to 64-bit integer widening outer product intrinsics (FEAT_SME_I16I64) | 1 |
23932393
| [`__ARM_FEATURE_SME_LOCALLY_STREAMING`](#scalable-matrix-extension-sme) | Support for the `arm_locally_streaming` attribute | 1 |
2394-
| [`__ARM_FEATURE_SME_LUTv2`](#lookup-table-extensions) | Lookup table extensions | 1 |
2394+
| [`__ARM_FEATURE_SME_LUTv2`](#lookup-table-extensions) | Lookup table extensions (FEAT_SME_LUTv2) | 1 |
23952395
| [`__ARM_FEATURE_SVE`](#scalable-vector-extension-sve) | Scalable Vector Extension (FEAT_SVE) | 1 |
23962396
| [`__ARM_FEATURE_SVE_BF16`](#brain-16-bit-floating-point-support) | SVE support for the 16-bit brain floating-point extension (FEAT_BF16) | 1 |
23972397
| [`__ARM_FEATURE_SVE_BITS`](#scalable-vector-extension-sve) | The number of bits in an SVE vector, when known in advance | 256 |
@@ -9089,6 +9089,7 @@ Floating-point absolute maximum (predicated).
90899089
svfloat16_t svamax[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
90909090
svfloat16_t svamax[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
90919091

9092+
// Variants are also available for: _f32 and _f64
90929093
svfloat16_t svamax[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
90939094
svfloat16_t svamax[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
90949095
svfloat16_t svamax[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
@@ -9103,6 +9104,7 @@ Floating-point absolute minimum (predicated).
91039104
svfloat16_t svamin[_f16]_x(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
91049105
svfloat16_t svamin[_f16]_z(svbool_t pg, svfloat16_t zn, svfloat16_t zm);
91059106

9107+
// Variants are also available for: _f32 and _f64
91069108
svfloat16_t svamin[_n_f16]_m(svbool_t pg, svfloat16_t zn, float16_t zm);
91079109
svfloat16_t svamin[_n_f16]_x(svbool_t pg, svfloat16_t zn, float16_t zm);
91089110
svfloat16_t svamin[_n_f16]_z(svbool_t pg, svfloat16_t zn, float16_t zm);
@@ -9135,7 +9137,7 @@ Lookup table read with 4-bit indices.
91359137

91369138
// Variant are also available for: _u16, _f16, _bf16
91379139
svint16_t svluti4_lane[_s16](svint16_t table, svuint8_t indices, uint64_t imm_idx);
9138-
svint16_t svluti4_lane[_s16]_x2(svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
9140+
svint16_t svluti4_lane[_s16_x2](svint16x2_t table, svuint8_t indices, uint64_t imm_idx);
91399141
```
91409142

91419143
# SME language extensions and intrinsics
@@ -12517,8 +12519,8 @@ Move vector register to ZT0.
1251712519

1251812520
Lookup table read with 4-bit indexes and 8-bit elements.
1251912521
``` c
12520-
// Variants are also available for: _s8
12521-
svuint8x4_t svluti4_zt_u8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0");
12522+
// Variants are also available for: _u8
12523+
svint8x4_t svluti4_zt_s8_x4(uint64_t zt0, svuint8x2_t zn) __arm_streaming __arm_in("zt0");
1252212524
```
1252312525

1252412526
# M-profile Vector Extension (MVE) intrinsics

neon_intrinsics/advsimd.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4543,21 +4543,21 @@ The intrinsics in this section are guarded by the macro ``__ARM_NEON``.
45434543
| Intrinsic | Argument preparation | AArch64 Instruction | Result | Supported architectures |
45444544
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--------------------------------------------|--------------------|---------------------------|
45454545
| <code>uint8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_u8" target="_blank">vluti4q_lane_u8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4546-
| <code>int8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_s8" target="_blank">vluti4q_lane_s8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4547-
| <code>poly8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_p8" target="_blank">vluti4q_lane_p8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
45484546
| <code>uint8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_u8" target="_blank">vluti4q_laneq_u8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4547+
| <code>int8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_s8" target="_blank">vluti4q_lane_s8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
45494548
| <code>int8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_s8" target="_blank">vluti4q_laneq_s8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4549+
| <code>poly8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_p8" target="_blank">vluti4q_lane_p8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 0` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
45504550
| <code>poly8x16_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_p8" target="_blank">vluti4q_laneq_p8</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly8x16_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn -> Vn.16B`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.16B, {Vn.16B}, Vm[index]` | `Vd.16B -> result` | `A64` |
4551-
| <code>uint16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_u16_x2" target="_blank">vluti4q_laneq_u16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4552-
| <code>int16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_s16_x2" target="_blank">vluti4q_laneq_s16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4553-
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_f16_x2" target="_blank">vluti4q_laneq_f16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4554-
| <code>bfloat16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_bf16_x2" target="_blank">vluti4q_laneq_bf16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; bfloat16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4555-
| <code>poly16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_p16_x2" target="_blank">vluti4q_laneq_p16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45564551
| <code>uint16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_u16_x2" target="_blank">vluti4q_lane_u16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4552+
| <code>uint16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_u16_x2" target="_blank">vluti4q_laneq_u16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; uint16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45574553
| <code>int16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_s16_x2" target="_blank">vluti4q_lane_s16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4554+
| <code>int16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_s16_x2" target="_blank">vluti4q_laneq_s16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; int16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45584555
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_f16_x2" target="_blank">vluti4q_lane_f16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4556+
| <code>float16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_f16_x2" target="_blank">vluti4q_laneq_f16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; float16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45594557
| <code>bfloat16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_bf16_x2" target="_blank">vluti4q_lane_bf16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; bfloat16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4558+
| <code>bfloat16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_bf16_x2" target="_blank">vluti4q_laneq_bf16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; bfloat16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45604559
| <code>poly16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_lane_p16_x2" target="_blank">vluti4q_lane_p16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x8_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 1` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
4560+
| <code>poly16x8_t <a href="https://developer.arm.com/architectures/instruction-sets/intrinsics/vluti4q_laneq_p16_x2" target="_blank">vluti4q_laneq_p16_x2</a>(<br>&nbsp;&nbsp;&nbsp;&nbsp; poly16x8x2_t vn,<br>&nbsp;&nbsp;&nbsp;&nbsp; uint8x16_t vm,<br>&nbsp;&nbsp;&nbsp;&nbsp; const int index)</code> | `vn.val[0] -> Vn1.8H`<br>`vn.val[1] -> Vn2.8H`<br>`vm -> Vm`<br>`0 <= index <= 3` | `LUTI4 Vd.8H, {Vn1.8H, Vn2.8H}, Vm[index]` | `Vd.8H -> result` | `A64` |
45614561

45624562
## Crypto
45634563

0 commit comments

Comments
 (0)