Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

simdsignunsignedextendedload.md #77

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion proposals/simd/BinarySIMD.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,5 +166,11 @@ The `v8x16.shuffle2_imm` instruction has 16 bytes after `simdop`.
| `f32x4.convert_u/i32x4` | `0xb0`| - |
| `f64x2.convert_s/i64x2` | `0xb1`| - |
| `f64x2.convert_u/i64x2` | `0xb2`| - |
| `i8x8.zxload` | `0xb3`| m:memarg |
| `i8x8.sxload` | `0xb4`| m:memarg |
| `i16x4.zxload` | `0xb5`| m:memarg |
| `i16x4.sxload` | `0xb6`| m:memarg |
| `i32x2.zxload` | `0xb7`| m:memarg |
| `i32x2.sxload` | `0xb8`| m:memarg |
| `v8x16.shuffle1` | `0xc0`| - |
| `v8x16.shuffle2_imm` | `0xc1`| s:LaneIdx32[16] |
| `v8x16.shuffle2_imm` | `0xc1`| s:LaneIdx32[16] |
6 changes: 6 additions & 0 deletions proposals/simd/ImplementationStatus.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,12 @@
| `f32x4.convert_u/i32x4` | `-msimd128` | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| `f64x2.convert_s/i64x2` | `-munimplemented-simd128` | | :heavy_check_mark: | :heavy_check_mark: |
| `f64x2.convert_u/i64x2` | `-munimplemented-simd128` | | :heavy_check_mark: | :heavy_check_mark: |
| `i8x8.zxload` | | | | |
| `i8x8.sxload` | | | | |
| `i16x4.zxload` | | | | |
| `i16x4.sxload` | | | | |
| `i32x2.zxload` | | | | |
| `i32x2.sxload` | | | | |
| `v8x16.shuffle1` | | | :heavy_check_mark: | |
| `v8x16.shuffle2_imm` | | | :heavy_check_mark: | :heavy_check_mark: |

Expand Down
9 changes: 9 additions & 0 deletions proposals/simd/SIMD.md
Original file line number Diff line number Diff line change
Expand Up @@ -666,6 +666,15 @@ natural alignment.

Load a `v128` vector from the given heap address.

Extended loads:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be made clear here that this is not operating on 64bit widths widening to a 128-bit vector? Having a concise description here will be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to clarify that we are extending every lane, rather than widening a 64-bit value?


* `i8x8.zxload(memarg) -> v128`: load eight 8-bit integers and zero extend each one to a 16-bit lane
* `i8x8.sxload(memarg) -> v128`: load eight 8-bit integers and sign extend each one to a 16-bit lane
* `i16x4.zxload(memarg) -> v128`: load four 16-bit integers and zero extend each one to a 32-bit lane
* `i16x4.sxload(memarg) -> v128`: load four 16-bit integers and sign extend each one to a 32-bit lane
* `i32x2.zxload(memarg) -> v128`: load two 32-bit integers and zero extend each one to a 64-bit lane
* `i32x2.sxload(memarg) -> v128`: load two 32-bit integers and sign extend each one to a 64-bit lane

### Store

* `v128.store(memarg, data: v128)`
Expand Down
64 changes: 64 additions & 0 deletions proposals/simd/docs/SIMD-sign-and-zero-extended-loads.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
### **Proposal WebAssembly SIMD Modification**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this document be removed? While this is useful information, I don't see a need for this to be included here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want it to be included in the body of the PR or just simply remove it? @rrwinterton, what do you think?


Currently as proposed there is an instructions defined in the WASM SIMD ISA as follows.

**i8x16.mul** which is a register to register operation that takes 16 8 bit integers and

multiplies them together resulting in an 8 bit value. If the distribution of the integers it flat this

would result in a large percent of the instructions with overflow. This is a problem for many applications.


### Proposed new instructions

Six new load instructions are being proposed to make integer multiplies easier. i8x16zxload, i8x16sxload, i16x8zxload, i16x8sxload, i32x4zxload, i32x4sxload. This would make i8, i16, i32 multiplies useful and more practical for applications such as machine learning, image compression and video and rendering data processing.The new instructions would take consecutive integers of the corresponding size and zero sign extend and sign extend the consecutive bytes, words or dword to the promoted size of signed or unsigned data. An example of zero sign extend is shown below: Intel and ARM both have this capability by doing the following:

Intel Instructions:



* movzxbw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to include the packed versions of these instructions (ex: pmovzxbw/pmovsxbw)? Please link the exact instruction you mean if that's not the one.

* movzxwd
* movzxdq
* movsxbw
* movsxwd
* Movsxdq

ARM Instructions:



* LDR X0, [X1] Load from the address in X1
* LDR X0, [X1, #8] Load from address X1 + 8
* LDR X0, [X1, X2] Load from address X1 + X2
* LDR X0, [X1, X2, LSL, #3] Load from address X1 + (X2 << 3)
* LDR X0, [X1, W2, SXTW] Load from address X1 + sign extend(W2)
* LDR X0, [X1, W2, SXTW, #3] Load from address X1 + (sign extend(W2) << 3)

So the new instructions for WASM would be defined as follows:



* i8x8.zxload
* i16x4.zxload
* i32x2.zxload
* i8x8.sxload
* i16x4.sxload
* i32x2.sxload

As a result of these new instructions a multiply can now be done without worrying about signed

and unsigned overflow on the data it operates on.

The following is a partial sample example of how sign extended loads are be used in a matrix multiply of 8 bit integers:


```
"pmovzxbw 0x00(%[mem]), %%xmm0\n\t"
"pshufd $0x00,%%xmm1,%%xmm2 \n\t"
"pshufd $0x55,%%xmm1,%%xmm3 \n\t"
"pmaddwd %%xmm0, %%xmm2 \n\t"
"pmaddwd %%xmm0, %%xmm3 \n\t"
"paddd %%xmm2, %%xmm4 \n\t"
"paddd %%xmm3, %%xmm5 \n\t"