From 3c00340fba48005f70c5aefec73367aaa26d6c97 Mon Sep 17 00:00:00 2001 From: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com> Date: Wed, 19 Mar 2025 16:36:45 -0700 Subject: [PATCH 1/6] docs: document riscv execution environment requirements --- docs/specs/RISCV.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/docs/specs/RISCV.md b/docs/specs/RISCV.md index ef35db8152..9067841dd9 100644 --- a/docs/specs/RISCV.md +++ b/docs/specs/RISCV.md @@ -34,6 +34,8 @@ We now specify the custom instructions for the default set of VM extensions. | ----------- | --- | ----------- | ------ | --------- | ------------------------------- | | terminate | I | 0001011 | 000 | `code` | terminate with exit code `code` | +The `terminate` instruction is a requested trap that will cause an orderly termination to guest execution with the specified exit code. + ## RV32IM Extension The RV32IM extension supports the RV32I Base Integer Instruction Set, Version 2.1 with `XLEN=32` @@ -47,13 +49,22 @@ execution environment interface (EEI). The OpenVM execution environment does not loads and stores. More specifically, guest execution considers misaligned accesses invalid and host execution will raise an exception resulting in a fatal trap. -In addition to the standard RV32IM opcodes, we support the following additional instructions to handle system interactions +### IO +In addition to the standard RV32IM opcodes, we support the following additional intrinsic instructions to handle interactions between the guest and host environments. +These instructions require the host execution environment to maintain the following data structures +as part of its state: + +- `input_stream`: a non-interactive queue of byte vectors which is provided at the start of + execution. This may be considered as the non-interactive input to the guest program. +- `hint_stream`: a queue of bytes populated during execution + via instructions such as `hintinput`. +- user IO space: a fixed length array of bytes, with length `num_public_values` which is a configuration constant of the execution environment. The length must equal `8` times a power of two. The IO space can be overwritten, and the final state of the IO space is persisted after execution halts. | RISC-V Inst | FMT | opcode[6:0] | funct3 | imm[0:11] | RISC-V description and notes | | ----------- | --- | ----------- | ------ | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | hintstorew | I | 0001011 | 001 | 0x0 | Stores next 4-byte word from hint stream in user memory at `[rd]_2`. Only valid if next 4 values in hint stream are bytes. | | hintbuffer | I | 0001011 | 001 | 0x1 | Stores next `4 * rs1` bytes from hint stream in user memory at `[rd..rd + 4 * rs1]_2`. Only valid if next `4 * rs1` values in hint stream are bytes and `rs1` is non-zero. | -| reveal | I | 0001011 | 010 | | Stores the 4-byte word `rs1` at address `rd + imm` in user IO space. | +| reveal | I | 0001011 | 010 | | Stores the 4-byte word `rs1` at address `rd + imm` in user IO space. The address `rd + imm` must be aligned to a 4-byte boundary. | | hintinput | I | 0001011 | 011 | 0x0 | Pop next vector from input stream and reset hint stream to the vector. | | printstr | I | 0001011 | 011 | 0x1 | Tries to convert `[rd..rd + rs1]_2` to UTF-8 string and print to host stdout. Will print error message if conversion fails. | | hintrandom | I | 0001011 | 011 | 0x2 | Resets the hint stream to `4 * rd` random bytes from `rand::rngs::OsRng` on the host. | From bad000bcdd54d8b85d07046deb0fc5ed35ca866e Mon Sep 17 00:00:00 2001 From: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com> Date: Wed, 19 Mar 2025 20:49:23 -0700 Subject: [PATCH 2/6] change implementation of read_vec_by_len for rust safety --- crates/toolchain/openvm/src/io/mod.rs | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/crates/toolchain/openvm/src/io/mod.rs b/crates/toolchain/openvm/src/io/mod.rs index e7cf9bacb3..0a65e4316d 100644 --- a/crates/toolchain/openvm/src/io/mod.rs +++ b/crates/toolchain/openvm/src/io/mod.rs @@ -63,15 +63,21 @@ pub(crate) fn read_vec_by_len(len: usize) -> Vec { #[cfg(target_os = "zkvm")] { - // Allocate a buffer of the required length that is 4 byte aligned - // Note: this expect message doesn't matter until our panic handler actually cares about it - let layout = Layout::from_size_align(capacity, 4).expect("vec is too large"); - // SAFETY: We populate a `Vec` by hintstore-ing `num_words` 4 byte words. We set the length to `len` and don't care about the extra `capacity - len` bytes stored. - let ptr_start = unsafe { alloc::alloc::alloc(layout) }; - - // Note: if len % 4 != 0, this will discard some last bytes + // Allocate a buffer of the required length + // We prefer that the allocator should allocate this buffer to a 4-byte boundary, + // but we do not specify it here because `Vec` safety requires the alignment to + // exactly equal the alignment of `u8`, which is 1. See `Vec::from_raw_parts` for more details. + // + // Note: the bump allocator we use by default has minimum alignment of 4 bytes. + // The heap-embedded-alloc uses linked list allocator, which has a minimum alignment of + // `sizeof(usize) * 2 = 8` on 32-bit architectures: https://github.com/rust-osdev/linked-list-allocator/blob/b5caf3271259ddda60927752fa26527e0ccd2d56/src/hole.rs#L429 + let mut bytes = Vec::with_capacity(capacity); hint_buffer_u32!(ptr_start, num_words); - unsafe { Vec::from_raw_parts(ptr_start, len, capacity) } + // SAFETY: We populate a `Vec` by hintstore-ing `num_words` 4 byte words. We set the length to `len` and don't care about the extra `capacity - len` bytes stored. + unsafe { + bytes.set_len(len); + } + bytes } #[cfg(not(target_os = "zkvm"))] { From bf088330bf0903a6fc7f7e1a0d8d4f2867701faf Mon Sep 17 00:00:00 2001 From: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com> Date: Wed, 19 Mar 2025 21:10:09 -0700 Subject: [PATCH 3/6] docs: update that hintbuffer has no alignment reqs --- docs/specs/ISA.md | 2 +- docs/specs/RISCV.md | 7 +++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/specs/ISA.md b/docs/specs/ISA.md index 30a46f8d2c..1bc3d0c4a0 100644 --- a/docs/specs/ISA.md +++ b/docs/specs/ISA.md @@ -419,7 +419,7 @@ with user input-output. | Name | Operands | Description | | ---------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | HINT_STOREW_RV32 | `_,b,_,1,2` | `[r32{0}(b):4]_2 = next 4 bytes from hint stream`. Only valid if next 4 values in hint stream are bytes. | -| HINT_BUFFER_RV32 | `a,b,_,1,2` | `[r32{0}(b):4 * l]_2 = next 4 * l bytes from hint stream` where `l = r32{0}(a)`. Only valid if next `4 * l` values in hint stream are bytes. Very important: `l` should not be 0. | +| HINT_BUFFER_RV32 | `a,b,_,1,2` | `[r32{0}(b):4 * l]_2 = next 4 * l bytes from hint stream` where `l = r32{0}(a)`. Only valid if next `4 * l` values in hint stream are bytes. Very important: `l` should not be 0. The pointer address `r32{0}(b)` does not need to be a multiple of `4`. | | REVEAL_RV32 | `a,b,c,1,3,_,g` | Pseudo-instruction for `STOREW_RV32 a,b,c,1,3,_,g` writing to the user IO address space `3`. Only valid when continuations are enabled. | #### Phantom Sub-Instructions diff --git a/docs/specs/RISCV.md b/docs/specs/RISCV.md index 9067841dd9..7948272c66 100644 --- a/docs/specs/RISCV.md +++ b/docs/specs/RISCV.md @@ -60,10 +60,13 @@ as part of its state: via instructions such as `hintinput`. - user IO space: a fixed length array of bytes, with length `num_public_values` which is a configuration constant of the execution environment. The length must equal `8` times a power of two. The IO space can be overwritten, and the final state of the IO space is persisted after execution halts. +The guest execution has no control over what the host provides in `input_stream` and `hint_stream`, so +the guest must take care to validate all data and account for behavior in cases of untrusted input. + | RISC-V Inst | FMT | opcode[6:0] | funct3 | imm[0:11] | RISC-V description and notes | | ----------- | --- | ----------- | ------ | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| hintstorew | I | 0001011 | 001 | 0x0 | Stores next 4-byte word from hint stream in user memory at `[rd]_2`. Only valid if next 4 values in hint stream are bytes. | -| hintbuffer | I | 0001011 | 001 | 0x1 | Stores next `4 * rs1` bytes from hint stream in user memory at `[rd..rd + 4 * rs1]_2`. Only valid if next `4 * rs1` values in hint stream are bytes and `rs1` is non-zero. | +| hintstorew | I | 0001011 | 001 | 0x0 | Stores next 4-byte word from hint stream in user memory at `[rd]_2`. The address `rd` does not have any alignment requirements. | +| hintbuffer | I | 0001011 | 001 | 0x1 | Stores next `4 * rs1` bytes from hint stream in user memory at `[rd..rd + 4 * rs1]_2`. Only valid if next `4 * rs1` values in hint stream are bytes and `rs1` is non-zero. The address `rd` does not have any alignment requirements. | | reveal | I | 0001011 | 010 | | Stores the 4-byte word `rs1` at address `rd + imm` in user IO space. The address `rd + imm` must be aligned to a 4-byte boundary. | | hintinput | I | 0001011 | 011 | 0x0 | Pop next vector from input stream and reset hint stream to the vector. | | printstr | I | 0001011 | 011 | 0x1 | Tries to convert `[rd..rd + rs1]_2` to UTF-8 string and print to host stdout. Will print error message if conversion fails. | From 60236efd3cc22a7769d3d01513a72bc79ab21da3 Mon Sep 17 00:00:00 2001 From: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com> Date: Wed, 19 Mar 2025 21:48:25 -0700 Subject: [PATCH 4/6] fix --- crates/toolchain/openvm/src/io/mod.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crates/toolchain/openvm/src/io/mod.rs b/crates/toolchain/openvm/src/io/mod.rs index 0a65e4316d..d2cd67718f 100644 --- a/crates/toolchain/openvm/src/io/mod.rs +++ b/crates/toolchain/openvm/src/io/mod.rs @@ -72,7 +72,7 @@ pub(crate) fn read_vec_by_len(len: usize) -> Vec { // The heap-embedded-alloc uses linked list allocator, which has a minimum alignment of // `sizeof(usize) * 2 = 8` on 32-bit architectures: https://github.com/rust-osdev/linked-list-allocator/blob/b5caf3271259ddda60927752fa26527e0ccd2d56/src/hole.rs#L429 let mut bytes = Vec::with_capacity(capacity); - hint_buffer_u32!(ptr_start, num_words); + hint_buffer_u32!(bytes.as_mut_ptr(), num_words); // SAFETY: We populate a `Vec` by hintstore-ing `num_words` 4 byte words. We set the length to `len` and don't care about the extra `capacity - len` bytes stored. unsafe { bytes.set_len(len); From e489c15f507d28350a32af8d9203b955e9078617 Mon Sep 17 00:00:00 2001 From: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com> Date: Thu, 20 Mar 2025 01:04:16 -0700 Subject: [PATCH 5/6] docs: add readme on soundness of hintbuffer --- .../rv32im/circuit/src/hintstore/README.md | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 extensions/rv32im/circuit/src/hintstore/README.md diff --git a/extensions/rv32im/circuit/src/hintstore/README.md b/extensions/rv32im/circuit/src/hintstore/README.md new file mode 100644 index 0000000000..455f4d785e --- /dev/null +++ b/extensions/rv32im/circuit/src/hintstore/README.md @@ -0,0 +1,26 @@ +# RV32 Hint Store Chip + +The chip is an instruction executor for the HINT_STORE_RV32 and HINT_BUFFER_RV32 instructions. + +Trace rows are exactly one of 3 types: +- `is_single = 1, is_buffer = 0`: to handle HINT_STORE_RV32 +- `is_buffer = 1, is_single = 0`: rows for HINT_BUFFER_RV32 +- `is_single = 0, is_buffer = 0`: dummy padding rows + +A single HINT_BUFFER_RV32 instruction may use multiple contiguous rows. The first row, +which is also the row that will send messages to the program and execution buses with non-zero +multiplicities, is marked with `is_buffer_start = 1` (and it is the only row within the rows for that +instruction with `is_buffer_start = 1`). + +On the starting row, a memory address `mem_ptr` is read from memory in the form of `4` limbs `mem_ptr_limbs`. The highest limb is range checked to be `pointer_max_bits - 24` bits, which ensures that calculating `mem_ptr` by composing the limbs `mem_ptr_limbs` will not overflow the field. + +On each row in the same HINT_BUFFER_RV32 instruction, the chip does a write to `[mem_ptr:4]_2` and increments `mem_ptr += 4`. +Under the invariant that timestamp is always increasing and the memory bus does not contain any invalid writes at previous timestamps, an attempted memory write access to `mem_ptr > 2^{pointer_max_bits} - 4` will not be able to balance the memory bus: it would require a send at an earlier timestamp to an out of bounds memory address, which the invariant prevents. +Only the starting `mem_ptr` is range checked: since each row will increment `mem_ptr` by `4`, an out +of bounds memory access will occur, causing an imbalance in the memory bus, before `mem_ptr` overflows the field. + +On the starting row, the `rem_words` is also read from memory in the form of `rem_words_limbs` limbs. +We also range check the highest limb to be `pointer_max_bits - 24` bits to ensure that calculating `rem_words` by composing the limbs `rem_words_limbs` will not overflow the field. Note that the bound `rem_words < 2^pointer_max_bits` is not tight, since `rem_words` refers to 4-byte words, not bytes. +On each row with `is_buffer = 1`, the `rem_words` is decremented by `1`. + +Note: we constrain that when the current instruction ends then `rem_words` is 1. However, we don't constrain that when `rem_words` is 1 then we have to end the current instruction. The only way to exploit this if we to do some multiple of `p` number of additional illegal `is_buffer = 1` rows where `p` is the prime modulus of `F`. However, when doing `p` additional rows we will always reach an illegal `mem_ptr` at some point which prevents this exploit. From bf8348131d8c4bd2c9b93052cdc84e318adb54d9 Mon Sep 17 00:00:00 2001 From: Jonathan Wang <31040440+jonathanpwang@users.noreply.github.com> Date: Thu, 20 Mar 2025 01:06:24 -0700 Subject: [PATCH 6/6] chore --- extensions/rv32im/circuit/src/hintstore/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/extensions/rv32im/circuit/src/hintstore/README.md b/extensions/rv32im/circuit/src/hintstore/README.md index 455f4d785e..aa196a73d7 100644 --- a/extensions/rv32im/circuit/src/hintstore/README.md +++ b/extensions/rv32im/circuit/src/hintstore/README.md @@ -4,7 +4,7 @@ The chip is an instruction executor for the HINT_STORE_RV32 and HINT_BUFFER_RV32 Trace rows are exactly one of 3 types: - `is_single = 1, is_buffer = 0`: to handle HINT_STORE_RV32 -- `is_buffer = 1, is_single = 0`: rows for HINT_BUFFER_RV32 +- `is_single = 0, is_buffer = 1`: rows for HINT_BUFFER_RV32 - `is_single = 0, is_buffer = 0`: dummy padding rows A single HINT_BUFFER_RV32 instruction may use multiple contiguous rows. The first row,