Integrated matrix extension#28
Integrated matrix extension#28joseemoreira wants to merge 24 commits intoriscv:integrated-matrix-extensionfrom
Conversation
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
Signed-off-by: Jose Moreira <jmoreira@us.ibm.com>
ptomsich
left a comment
There was a problem hiding this comment.
Please split into individual PRs for individual topics/semantic changes:
- the editorial changes can be reviewed separately — and are mostly non-controversial
- profiles belong in a different document
- G, W, and psm are controversial (at least G and W cross red lines for me) and I'd prefer to discuss those separately.
| |Zvvfp64mm ^| Zve64d | IEEE binary64, IEEE binary64 | IEEE binary64 | ||
| |=== | ||
|
|
||
| === Recommended Profiles |
There was a problem hiding this comment.
As part of the standards-development and ratification-process, we submit a PR for the entire chapter into the ISA manual. As a direct consequence, we can't include text that is outside of our scope (the Profiles SIG and the TSC are in charge of profiles).
We need to split this section out of the .adoc — how about a separate file that we can send to Profiles SIG as a proposal?
@joseemoreira If it helps, I can have you put on the agenda at TSC to give a presentation on "The need to define profiles for IME — with suggestions."?
| |=== | ||
|
|
||
| [#integrated-matrix-psm] | ||
| ==== Partial-sum mode (`psm`) |
There was a problem hiding this comment.
This feels odd as a vtype-field, as only one of the modes will typically be supported.
I wonder if this is rather an ABI issue: tag ELF objects as THIS, THAT, DONT_CARE ... and leave it to the linker and runtime linker to figure out?
My recommendation: don't have this as a vtype-field. Given that the vtype is "a split-out part of the opcde", having a vtype-field is equivalent to having two different opcodes/mnemonics . I don't think this is the intent?
| The `psm` field affects only floating-point Zvvm instructions. It has no effect on integer matrix multiply-accumulate instructions or on tile load/store instructions. | ||
|
|
||
| [#integrated-matrix-W] | ||
| ==== Arithmetic widening factor (`W`) |
There was a problem hiding this comment.
NAK.
The widening-factor is part of what the instruction does, so it's part of the mnemomic.
We have the bits available in the opcode, so there's no need to make this indirect (by forcing extra vsetvl instructions).
| | 1 | 1 | 8 | ||
| |=== | ||
|
|
||
| The `W` field is located at `vtype[XLEN-9:XLEN-10]`. |
There was a problem hiding this comment.
We were asked to stay out of the immediate vtype-bits and consequently kept the widening inside the explicit opcode bits.
Unless VME uses the same bits, there is no reason why this could make sense.
Given that VME decided they want their own mtype: NAK.
| [#integrated-matrix-G] | ||
| ==== Partial-sum grouping factor (`G`) | ||
|
|
||
| The `G[2:0]` field is a 3-bit WARL read/write field in the `vtype` CSR that selects the grouping factor used to form floating-point partial sums in Zvvm floating-point matrix multiply-accumulate instructions. |
There was a problem hiding this comment.
Does this really make sense? Will G be selectable?
Or will it rather be a consequence of the ( \lamba x SEW x W x LMUL ) combination for a given hardware?
My guess is that it will be the latter … so it's not something you select, but something you acknowledge.
In that case, instead of a WARL-field, we might want to define a standard way (as part of the ABI) how to convey this information to user-software?
| if (2 ^ get_sew_pow()) < 32 then return Illegal_Instruction(); | ||
|
|
||
| let g : gemm_geom = decode_gemm_geometry(8, false); | ||
| let g : gemm_geom = decode_gemm_geometry(read_vtype_W(), false); |
There was a problem hiding this comment.
NAK. And independently of the NAK: if you change the unrolling factor in one line, you'd have to change it everywhere (e.g., 3 lines below, we call check_microscaling_legality with W).
Process all 28 items from the IME TG internal review feedback tracker. Subextension dependencies (riscv#3): Replace blanket Zve64d dependency with the minimum Zve subset per subextension: Zve32x for integer accumulators ≤ 32-bit, Zve64x for Int64 accumulators, Zve32f for FP accumulators ≤ 32-bit, and Zve64d only for FP64 accumulators. 8× widening instructions (riscv#7, riscv#8, riscv#9, riscv#24): Add v8wmmacc.vv (funct6=0x3b, OPIVV), vf8wmmacc.vv (funct6=0x17, OPFVV), and vf8wimmacc.vv (integer-input MX variant, vm=0 of v8wmmacc) with full instruction definitions, SAIL pseudocode, encoding diagrams, and exception tables. Update encoding maps (FP, integer, integer MX) with W=8 entries. Add Zvvxi4fp32mm and Zvvxni4fp32mm to the MX subextension table. Replace the informative NOTE about reserved W=8 encoding space with normative text. Remove the undefined term "octal-widening". MXINT4 clarification and OCP citation (riscv#14): Define MXINT4 as analogous to OCP MX's MXINT8 but with 4-bit signed elements. Add proper citation of the OCP Microscaling Formats (MX) v1.0 Specification with URL. Update microscaling applicability to include vf8wmmacc.vv. vfmmacc.vv vm=0 cleanup (riscv#13, riscv#28): Remove contradictory "When vm=0" exception bullets (vm=0 is reserved for non-widening FP). Replace dead microscaling SAIL code with a straightforward non-widening FP GEMM loop. Add explicit note that microscaling is not supported for non-widening multiply-accumulate. Terminology fixes (riscv#15, riscv#21): Add forward cross-reference at first use of altfmt_A/altfmt_B. Correct two occurrences where λ was described as "the K dimension" to "tile-layout parameter", clarifying that K_eff = λ × W × LMUL is the derived effective K dimension.
Philipp, Erich, and Bing: Please take a look at the changes I am proposing. They are meant to allow (but not mandate) compatibility of floating-point results with Zvt (https://github.com/aswaterman/riscv-misc/blob/main/isa/zvt/zvt.adoc#matrix-arithmetic-instructions).
I am just looking for feedback at this point. I suspect we will need some discussions before we are all OK with any of these changes. When we get there, I can update the SAIL.
I have not found a good way to have G as a WARL field that one can discover. It is not something that can be set in vtype, because it depends on non-vtype parameters, such as W.