Skip to content

Commit 82be3c0

Browse files
New segment encoding (WebAssembly#130)
* Fix outdated stuff * Update Overview.md * Update Overview.md * Update Overview.md * Clarify zero table index * Address review comments
1 parent 1e29660 commit 82be3c0

File tree

1 file changed

+97
-105
lines changed

1 file changed

+97
-105
lines changed

proposals/bulk-memory-operations/Overview.md

Lines changed: 97 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -45,11 +45,11 @@ the following contents:
4545

4646
```wasm
4747
(func (param $dst i32) (param $src i32) (param $size i32) (result i32)
48-
get_local $dst
49-
get_local $src
50-
get_local $size
48+
local.get $dst
49+
local.get $src
50+
local.get $size
5151
memory.copy
52-
get_local $dst)
52+
local.get $dst)
5353
```
5454

5555
Here are the results on my machine (x86_64, 2.9GHz, L1 32k, L2 256k, L3 256k):
@@ -160,8 +160,6 @@ Filling a memory region can be accomplished with `memory.fill`:
160160

161161
* `memory.fill`: fill a region of linear memory with a given byte value
162162

163-
TODO: should we provide `memory.clear` and `table.clear` instead?
164-
165163
The [binary format for the data
166164
section](https://webassembly.github.io/spec/core/binary/modules.html#binary-datasec)
167165
currently has a collection of segments, each of which has a memory index, an
@@ -171,93 +169,111 @@ Since WebAssembly currently does not allow for multiple memories, the memory
171169
index of each segment must be zero. We can repurpose this 32-bit integer as a
172170
flags field where new meaning is attached to nonzero values.
173171

174-
When the new flags field is `1`, this segment is _passive_. A passive segment
175-
will not be automatically copied into the memory or table on instantiation, and
176-
must instead be applied manually using the following new instructions:
172+
When the low bit of the new flags field is `1`, this segment is _passive_. A
173+
passive segment will not be automatically copied into the memory or table on
174+
instantiation, and must instead be applied manually using the following new
175+
instructions:
177176

178177
* `memory.init`: copy a region from a data segment
179178
* `table.init`: copy a region from an element segment
180179

181180
A passive segment has no initializer expression, since it will be specified
182181
as an operand to `memory.init` or `table.init`.
183182

184-
Segments can also be discarded by using the following new instructions:
183+
Segments can also be shrunk to size zero by using the following new instructions:
185184

186-
* `data.drop`: prevent further use of a data segment
187-
* `elem.drop`: prevent further use of an element segment
185+
* `data.drop`: discard the data in an data segment
186+
* `elem.drop`: discard the data in an element segment
188187

189188
An active segment is equivalent to a passive segment, but with an implicit
190189
`memory.init` followed by a `data.drop` (or `table.init` followed by a
191190
`elem.drop`) that is prepended to the module's start function.
192191

193-
The new encoding of a data segment is now:
192+
Additionally, the reference-types proposal introduces the notion of a function
193+
reference (a function whose address is a program value). To support this,
194+
element segments can have several encodings, and can also be used to
195+
forward-declare functions whose address will be taken; see below.
196+
197+
The reference-types proposal also introduces the bulk instructions `table.fill`
198+
and `table.grow`, both of which take a function reference as an initializer
199+
argument.
200+
201+
### Data segments
202+
203+
The meaning of the bits of the flag field (a `varuint32`) for data segments is:
204+
205+
| Bit | Meaning |
206+
| - | - |
207+
| 0 | 0=is active, 1=is passive |
208+
| 1 | if bit 0 clear: 0=memory 0, 1=has memory index |
209+
210+
which yields this view, with the fields carried by each flag value:
194211

195-
| Field | Type | Present? | Description |
196-
| - | - | - | - |
197-
| flags | `varuint32` | always | Flags for passive and presence of fields below, only values of 0, 1, and 2 are valid |
198-
| index | `varuint32`? | flags = 2 | Memory index; 0 if the field is not present |
199-
| offset | `init_expr`? | flags != 1 | an `i32` initializer expression for offset |
200-
| size | `varuint32` | always | size of `data` (in bytes) |
201-
| data | `bytes` | always | sequence of `size` bytes |
212+
| Flags | Meaning | Memory index | Offset in memory | Count | Payload |
213+
| - | - | - | - | - | - |
214+
| 0 | Active | | `init_expr` | `varuint32` | `u8`* |
215+
| 1 | Passive | | | `varuint32` | `u8`* |
216+
| 2 | Active with memory index | `varuint32` | `init_expr` | `varuint32` | `u8`* |
202217

203-
Another way of looking at it:
218+
All other flag values are illegal. At present the memory index must be zero,
219+
but the upcoming multi-memory proposal changes that.
204220

205-
| Flags | Active? | index | offset |
206-
| - | - | - | - |
207-
| 0 | Active | Always 0 | Present |
208-
| 1 | Passive | - | - |
209-
| 2 | Active | Present | Present |
210221

211222
### Element segments
212223

213-
The new binary format for element segments is similar to the new format for data segments, but
214-
also includes an element type when the segment is passive. A passive segment also has a sequence
215-
of `expr`s instead of function indices.
224+
The meaning of the bits of the flag field (a `varuint32`) for element segments is:
216225

217-
| Field | Type | Present? | Description |
218-
| - | - | - | - |
219-
| flags | `varuint32` | always | Flags for passive and presence of fields below, only values of 0, 1, and 2 are valid |
220-
| index | `varuint32`? | flags = 2 | Table index; 0 if the field is not present |
221-
| element_type | `elem_type`? | flags = 1 | element type of this segment; `anyfunc` if not present |
222-
| offset | `init_expr`? | flags != 1 | an `i32` initializer expression for offset |
223-
| count | `varuint32` | always | number of elements |
224-
| elems | `varuint32*` | flags != 1 | sequence of function indices |
225-
| elems | `elem_expr*` | flags = 1 | sequence of element expressions |
226+
| Bit | Meaning |
227+
| - | - |
228+
| 0 | 0=is active, 1=is passive |
229+
| 1 | if bit 0 clear: 0=table 0, 1=has table index |
230+
| | if bit 0 set: 0=active, 1=declared |
231+
| 2 | 0=carries indicies; 1=carries elemexprs |
226232

227-
Another way of looking at it:
233+
which yields this view, with the fields carried by each flag value:
228234

229-
| Flags | Active? | index | element_type | offset |
230-
| - | - | - | - | - |
231-
| 0 | Active | Always 0 | Always `anyfunc` | Present |
232-
| 1 | Passive | - | Present | - |
233-
| 2 | Active | Present | Always `anyfunc` | Present |
235+
| Flag | Meaning | Table index | Offset in table | Encoding | Count | Payload |
236+
| - | - | - | - | - | - | - |
237+
| 0 | Legacy active, funcref externval | | `init_expr` | | `varuint32` | `idx`* |
238+
| 1 | Passive, externval | | | `extern_kind` | `varuint32` | `idx`* |
239+
| 2 | Active, externval | `varuint32` | `init_expr` | `extern_kind` | `varuint32` | `idx`* |
240+
| 3 | Declared, externval | | | `extern_kind` | `varuint32` | `idx`* |
241+
| 4 | Legacy active, funcref elemexpr | | `init_expr` | | `varuint32` | `elem_expr`* |
242+
| 5 | Passive, elemexpr | | | `elem_type` | `varuint32` | `elem_expr`* |
243+
| 6 | Active, elemexpr | `varuint32` | `init_expr` | `elem_type` | `varuint32` | `elem_expr`* |
244+
| 7 | Declared, elemexpr | | | `elem_type` | `varuint32` | `elem_expr`* |
245+
246+
All other flag values are illegal. Note that the "declared" attribute
247+
is not used by this proposal, but is used by the reference-types
248+
proposal.
249+
250+
The `extern_kind` must be zero, signifying a function definition. An `idx` is a
251+
`varuint32` that references an entity in the module, currently only its function
252+
table.
253+
254+
At present the table index must be zero, but the reference-types
255+
proposal introduces a notion of multiple tables.
234256

235257
An `elem_expr` is like an `init_expr`, but can only contain expressions of the following sequences:
236258

237-
| Binary | Text | Description |
238-
| - | - | - |
239-
| `0xd0 0x0b` | `ref.null end` | Returns a null reference |
259+
| Binary | Text | Description |
260+
| - | - | - |
261+
| `0xd0 0x0b` | `ref.null end` | Returns a null reference |
240262
| `0xd2 varuint32 0x0b` | `ref.func $funcidx end` | Returns a reference to function `$funcidx` |
241263

242-
TODO: coordinate with other proposals to determine the binary encoding for `ref.null` and `ref.func`.
243-
244264
### Segment Initialization
245265

246266
In the MVP, segments are initialized during module instantiation. If any segment
247267
would be initialized out-of-bounds, then the memory or table instance is not
248268
modified.
249269

250-
This behavior is changed in the bulk memory proposal.
251-
252-
Each active segment is initialized in module-definition order. For each
253-
segment, each byte in the data segment is copied into the memory, in order of
254-
lowest to highest addresses. If, for a given byte, the copy is out-of-bounds,
255-
instantiation fails and no further bytes in this segment nor further segments
256-
are copied. Bytes written before this point stay written.
270+
This behavior is changed in the bulk memory proposal:
257271

258-
The behavior of element segment initialization is changed similarly, with the
259-
difference that elements are copied from element segments into tables, instead
260-
of bytes being copied from data segments into memories.
272+
Each active segment is initialized in module-definition order. For
273+
each segment, if reading the source or writing the destination would
274+
go out of bounds, then instantiation fails at that point. Data that
275+
had already been written for previous (in-bounds) segments stays
276+
written.
261277

262278
### `memory.init` instruction
263279

@@ -273,41 +289,27 @@ The instruction has the signature `[i32 i32 i32] -> []`. The parameters are, in
273289
It is a validation error to use `memory.init` with an out-of-bounds segment index.
274290

275291
A trap occurs if:
292+
276293
* the source offset plus size is greater than the length of the source data segment;
277294
this includes the case that the segment has been dropped via `data.drop`
278295
* the destination offset plus size is greater than the length of the target memory
279296

297+
The order of writing is unspecified, though this is currently unobservable.
298+
280299
Note that it is allowed to use `memory.init` on the same data segment more than
281300
once.
282301

283-
Initialization takes place bytewise from lower addresses toward higher
284-
addresses. A trap resulting from an access outside the source data
285-
segment or target memory only occurs once the first byte that is
286-
outside the source or target is reached. Bytes written before the
287-
trap stay written.
288-
289-
(Data are read and written as-if individual bytes were read and
290-
written, but various optimizations are possible that avoid reading and
291-
writing only individual bytes.)
292-
293-
Note that the semantics require bytewise accesses, so a trap that
294-
might result from, say, reading a sequence of several words before
295-
writing any, will have to be handled carefully: the reads that
296-
succeeded will have to be written, if possible.
297-
298302
### `data.drop` instruction
299303

300-
The `data.drop` instruction prevents further use of a given segment. After a
301-
data segment has been dropped, it is no longer valid to use it in a `memory.init`
302-
instruction. This instruction is intended to be used as an optimization hint to
303-
the WebAssembly implementation. After a memory segment is dropped its data can
304-
no longer be retrieved, so the memory used by this segment may be freed.
304+
The `data.drop` instruction shrinks the size of the segment to zero. After a
305+
data segment has been dropped, it can still be used in a `memory.init`
306+
instruction, but only a zero-length access at offset zero will not trap. This
307+
instruction is intended to be used as an optimization hint to the WebAssembly
308+
implementation. After a memory segment is dropped its data can no longer be
309+
retrieved, so the memory used by this segment may be freed.
305310

306311
It is a validation error to use `data.drop` with an out-of-bounds segment index.
307312

308-
A trap occurs if the segment was already dropped. This includes active segments
309-
that were dropped after being copied into memory during module instantiation.
310-
311313
### `memory.copy` instruction
312314

313315
Copy data from a source memory region to destination region. The
@@ -336,17 +338,11 @@ The instruction has the signature `[i32 i32 i32] -> []`. The parameters are, in
336338
- top-0: size of memory region in bytes
337339

338340
A trap occurs if:
341+
339342
* the source offset plus size is greater than the length of the source memory
340343
* the destination offset plus size is greater than the length of the target memory
341344

342-
A trap resulting from an access outside the source or target region
343-
only occurs once the first byte that is outside the source or target
344-
is reached (in the defined copy order). Bytes written before the trap
345-
stay written.
346-
347-
(Data are read and written as-if individual bytes were read and
348-
written, but various optimizations are possible that avoid reading and
349-
writing only individual bytes.)
345+
The bounds check is performed before any data are written.
350346

351347
### `memory.fill` instruction
352348

@@ -360,15 +356,11 @@ The instruction has the signature `[i32 i32 i32] -> []`. The parameters are, in
360356
- top-0: size of memory region in bytes
361357

362358
A trap occurs if:
363-
* the destination offset plus size is greater than the length of the target memory
364359

365-
Filling takes place bytewise from lower addresses toward higher
366-
addresses. A trap resulting from an access outside the target memory
367-
only occurs once the first byte that is outside the target is reached.
368-
Bytes written before the trap stay written.
360+
* the destination offset plus size is greater than the length of the target memory
361+
362+
The bounds check is performed before any data are written.
369363

370-
(Data are written as-if individual bytes were written, but various
371-
optimizations are possible that avoid writing only individual bytes.)
372364

373365
### `table.init`, `elem.drop`, and `table.copy` instructions
374366

@@ -390,7 +382,7 @@ implemented as follows:
390382
(data passive "goodbye") ;; data segment 1, is passive
391383
392384
(func $start
393-
(if (get_global 0)
385+
(if (global.get 0)
394386
395387
;; copy data segment 1 into memory 0 (the 0 is implicit)
396388
(memory.init 1
@@ -416,13 +408,13 @@ instr ::= ...
416408

417409
| Name | Opcode | Immediate | Description |
418410
| ---- | ---- | ---- | ---- |
419-
| `memory.init` | `0xfc 0x08` | `segment:varuint32`, `memory:0x00` | :thinking: copy from a passive data segment to linear memory |
420-
| `data.drop` | `0xfc 0x09` | `segment:varuint32` | :thinking: prevent further use of passive data segment |
421-
| `memory.copy` | `0xfc 0x0a` | `memory_dst:0x00` `memory_src:0x00` | :thinking: copy from one region of linear memory to another region |
422-
| `memory.fill` | `0xfc 0x0b` | `memory:0x00` | :thinking: fill a region of linear memory with a given byte value |
423-
| `table.init` | `0xfc 0x0c` | `segment:varuint32`, `table:0x00` | :thinking: copy from a passive element segment to a table |
424-
| `elem.drop` | `0xfc 0x0d` | `segment:varuint32` | :thinking: prevent further use of a passive element segment |
425-
| `table.copy` | `0xfc 0x0e` | `table_dst:0x00` `table_src:0x00` | :thinking: copy from one region of a table to another region |
411+
| `memory.init` | `0xfc 0x08` | `segment:varuint32`, `memory:0x00` | copy from a passive data segment to linear memory |
412+
| `data.drop` | `0xfc 0x09` | `segment:varuint32` | prevent further use of passive data segment |
413+
| `memory.copy` | `0xfc 0x0a` | `memory_dst:0x00` `memory_src:0x00` | copy from one region of linear memory to another region |
414+
| `memory.fill` | `0xfc 0x0b` | `memory:0x00` | fill a region of linear memory with a given byte value |
415+
| `table.init` | `0xfc 0x0c` | `segment:varuint32`, `table:0x00` | copy from a passive element segment to a table |
416+
| `elem.drop` | `0xfc 0x0d` | `segment:varuint32` | prevent further use of a passive element segment |
417+
| `table.copy` | `0xfc 0x0e` | `table_dst:0x00` `table_src:0x00` | copy from one region of a table to another region |
426418

427419
### `DataCount` section
428420

0 commit comments

Comments
 (0)