Skip to content

Commit 513f3fa

Browse files
committed
Added figures for tile load/store.
Added text on optimization of cache access through LMUL usage. Removed some confusing text on resulting tile dimensions.
1 parent b38ff2b commit 513f3fa

File tree

3 files changed

+17
-1
lines changed

3 files changed

+17
-1
lines changed
193 KB
Loading

src/images/png/ime-vmtls-lmul.png

256 KB
Loading

src/integrated-matrix.adoc

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
== Zvvm Family of Integrated Matrix Extensions
33

44
:stem: latexmath
5+
:imagesdir: ../docs-resources/images
56
:imagesdir: images
67

78
=== Introduction
@@ -1116,8 +1117,14 @@ The tile load and store instructions make use of the following parameters from t
11161117
* LMUL — vector length multiplier
11171118
* λ — selected lambda, read from `lambda[2:0]` in `vtype`
11181119

1119-
The resulting tile dimensions are μ = ν = VL/λ, with the accumulator tile C occupying MUL = LMUL/λ² vector registers.
11201120
When loading A or B input tiles, `vmtl.v` and `vmttl.v` shall be used with SEW equal to the element width of the C accumulator tile.
1121+
<<#ime-load-store-geometry>> illustrates the memory to VR load for both row-major and column-major order for a tile with LMUL=1.
1122+
Physically both transfers are identical: they move contiguous segments of length _linesize_ = λ × LMUL with a stride of LD between them.
1123+
The tile load/store instructions interpret the memory layout according to the specified leading dimension, but the resulting data layout in the VR is the same regardless of whether the source/destination matrix is stored in row-major or column-major order.
1124+
1125+
[#ime-load-store-geometry]
1126+
.Loading a matrix tile from memory for LMUL=1. The matrix is layed out linearly in memory, the leading dimension LD specifies its row size (a) or column size (b). Element indices represent the offset of the elements in memory. Blue arrows indicate the data ordering in memory/VR.
1127+
image::png/ime-load-store-geometry.png[align="center"]
11211128

11221129
If (rs2) = 0, then the leading dimension LD is set to the _natural dimension_ of λ × LMUL.
11231130
That is, the memory layout, with elements contiguous to each other, matches the layout of the register group being loaded/stored.
@@ -1188,6 +1195,14 @@ For each element index `i` in the body `[vstart, VL)` where the mask is enabled:
11881195

11891196
M[rs1 + (SEW ÷ 8) × ((i / linesize) × LD + (i % linesize))] = VS[i]
11901197

1198+
[NOTE]
1199+
====
1200+
Order preserving tile load/store with LMUL > 1 offers optimization opportunities. While vmtl/vmts are very similar to vector constant-stride segment operations, the segment sizes are potentially larger. Matching cache line size with λ × LMUL × SEW allows for full cacheline transfers.
1201+
====
1202+
[#ime-vmtls-lmul]
1203+
.Order preserving tile load/store with LMUL > 1 for row-major (a) and column-major ordering in memory.
1204+
image::png/ime-vmtls-lmul.png[align="center", width="90%"]
1205+
11911206
===== `vmttl.v` — Transposing Tile Load
11921207

11931208
vmttl.v vd, (rs1), rs2 [, Lλ] [, vm]
@@ -3410,3 +3425,4 @@ Included in::
34103425
|0.1
34113426
|Draft
34123427
|===
3428+

0 commit comments

Comments
 (0)