Skip to content

docs: update specs on RISC-V execution environment #1489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 14 additions & 8 deletions crates/toolchain/openvm/src/io/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,21 @@ pub(crate) fn read_vec_by_len(len: usize) -> Vec<u8> {

#[cfg(target_os = "zkvm")]
{
// Allocate a buffer of the required length that is 4 byte aligned
// Note: this expect message doesn't matter until our panic handler actually cares about it
let layout = Layout::from_size_align(capacity, 4).expect("vec is too large");
// Allocate a buffer of the required length
// We prefer that the allocator should allocate this buffer to a 4-byte boundary,
// but we do not specify it here because `Vec<u8>` safety requires the alignment to
// exactly equal the alignment of `u8`, which is 1. See `Vec::from_raw_parts` for more details.
//
// Note: the bump allocator we use by default has minimum alignment of 4 bytes.
// The heap-embedded-alloc uses linked list allocator, which has a minimum alignment of
// `sizeof(usize) * 2 = 8` on 32-bit architectures: https://github.com/rust-osdev/linked-list-allocator/blob/b5caf3271259ddda60927752fa26527e0ccd2d56/src/hole.rs#L429
let mut bytes = Vec::with_capacity(capacity);
hint_buffer_u32!(bytes.as_mut_ptr(), num_words);
// SAFETY: We populate a `Vec<u8>` by hintstore-ing `num_words` 4 byte words. We set the length to `len` and don't care about the extra `capacity - len` bytes stored.
let ptr_start = unsafe { alloc::alloc::alloc(layout) };

// Note: if len % 4 != 0, this will discard some last bytes
hint_buffer_u32!(ptr_start, num_words);
unsafe { Vec::from_raw_parts(ptr_start, len, capacity) }
unsafe {
bytes.set_len(len);
}
bytes
}
#[cfg(not(target_os = "zkvm"))]
{
Expand Down
2 changes: 1 addition & 1 deletion docs/specs/ISA.md
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ with user input-output.
| Name | Operands | Description |
| ---------------- | --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| HINT_STOREW_RV32 | `_,b,_,1,2` | `[r32{0}(b):4]_2 = next 4 bytes from hint stream`. Only valid if next 4 values in hint stream are bytes. |
| HINT_BUFFER_RV32 | `a,b,_,1,2` | `[r32{0}(b):4 * l]_2 = next 4 * l bytes from hint stream` where `l = r32{0}(a)`. Only valid if next `4 * l` values in hint stream are bytes. Very important: `l` should not be 0. |
| HINT_BUFFER_RV32 | `a,b,_,1,2` | `[r32{0}(b):4 * l]_2 = next 4 * l bytes from hint stream` where `l = r32{0}(a)`. Only valid if next `4 * l` values in hint stream are bytes. Very important: `l` should not be 0. The pointer address `r32{0}(b)` does not need to be a multiple of `4`. |
| REVEAL_RV32 | `a,b,c,1,3,_,g` | Pseudo-instruction for `STOREW_RV32 a,b,c,1,3,_,g` writing to the user IO address space `3`. Only valid when continuations are enabled. |

#### Phantom Sub-Instructions
Expand Down
22 changes: 18 additions & 4 deletions docs/specs/RISCV.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ We now specify the custom instructions for the default set of VM extensions.
| ----------- | --- | ----------- | ------ | --------- | ------------------------------- |
| terminate | I | 0001011 | 000 | `code` | terminate with exit code `code` |

The `terminate` instruction is a requested trap that will cause an orderly termination to guest execution with the specified exit code.

## RV32IM Extension

The RV32IM extension supports the RV32I Base Integer Instruction Set, Version 2.1 with `XLEN=32`
Expand All @@ -47,13 +49,25 @@ execution environment interface (EEI). The OpenVM execution environment does not
loads and stores. More specifically, guest execution considers misaligned accesses invalid
and host execution will raise an exception resulting in a fatal trap.

In addition to the standard RV32IM opcodes, we support the following additional instructions to handle system interactions
### IO
In addition to the standard RV32IM opcodes, we support the following additional intrinsic instructions to handle interactions between the guest and host environments.
These instructions require the host execution environment to maintain the following data structures
as part of its state:

- `input_stream`: a non-interactive queue of byte vectors which is provided at the start of
execution. This may be considered as the non-interactive input to the guest program.
- `hint_stream`: a queue of bytes populated during execution
via instructions such as `hintinput`.
- user IO space: a fixed length array of bytes, with length `num_public_values` which is a configuration constant of the execution environment. The length must equal `8` times a power of two. The IO space can be overwritten, and the final state of the IO space is persisted after execution halts.

The guest execution has no control over what the host provides in `input_stream` and `hint_stream`, so
the guest must take care to validate all data and account for behavior in cases of untrusted input.

| RISC-V Inst | FMT | opcode[6:0] | funct3 | imm[0:11] | RISC-V description and notes |
| ----------- | --- | ----------- | ------ | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| hintstorew | I | 0001011 | 001 | 0x0 | Stores next 4-byte word from hint stream in user memory at `[rd]_2`. Only valid if next 4 values in hint stream are bytes. |
| hintbuffer | I | 0001011 | 001 | 0x1 | Stores next `4 * rs1` bytes from hint stream in user memory at `[rd..rd + 4 * rs1]_2`. Only valid if next `4 * rs1` values in hint stream are bytes and `rs1` is non-zero. |
| reveal | I | 0001011 | 010 | | Stores the 4-byte word `rs1` at address `rd + imm` in user IO space. |
| hintstorew | I | 0001011 | 001 | 0x0 | Stores next 4-byte word from hint stream in user memory at `[rd]_2`. The address `rd` does not have any alignment requirements. |
| hintbuffer | I | 0001011 | 001 | 0x1 | Stores next `4 * rs1` bytes from hint stream in user memory at `[rd..rd + 4 * rs1]_2`. Only valid if next `4 * rs1` values in hint stream are bytes and `rs1` is non-zero. The address `rd` does not have any alignment requirements. |
| reveal | I | 0001011 | 010 | | Stores the 4-byte word `rs1` at address `rd + imm` in user IO space. The address `rd + imm` must be aligned to a 4-byte boundary. |
| hintinput | I | 0001011 | 011 | 0x0 | Pop next vector from input stream and reset hint stream to the vector. |
| printstr | I | 0001011 | 011 | 0x1 | Tries to convert `[rd..rd + rs1]_2` to UTF-8 string and print to host stdout. Will print error message if conversion fails. |
| hintrandom | I | 0001011 | 011 | 0x2 | Resets the hint stream to `4 * rd` random bytes from `rand::rngs::OsRng` on the host. |
Expand Down
26 changes: 26 additions & 0 deletions extensions/rv32im/circuit/src/hintstore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# RV32 Hint Store Chip

The chip is an instruction executor for the HINT_STORE_RV32 and HINT_BUFFER_RV32 instructions.

Trace rows are exactly one of 3 types:
- `is_single = 1, is_buffer = 0`: to handle HINT_STORE_RV32
- `is_single = 0, is_buffer = 1`: rows for HINT_BUFFER_RV32
- `is_single = 0, is_buffer = 0`: dummy padding rows

A single HINT_BUFFER_RV32 instruction may use multiple contiguous rows. The first row,
which is also the row that will send messages to the program and execution buses with non-zero
multiplicities, is marked with `is_buffer_start = 1` (and it is the only row within the rows for that
instruction with `is_buffer_start = 1`).

On the starting row, a memory address `mem_ptr` is read from memory in the form of `4` limbs `mem_ptr_limbs`. The highest limb is range checked to be `pointer_max_bits - 24` bits, which ensures that calculating `mem_ptr` by composing the limbs `mem_ptr_limbs` will not overflow the field.

On each row in the same HINT_BUFFER_RV32 instruction, the chip does a write to `[mem_ptr:4]_2` and increments `mem_ptr += 4`.
Under the invariant that timestamp is always increasing and the memory bus does not contain any invalid writes at previous timestamps, an attempted memory write access to `mem_ptr > 2^{pointer_max_bits} - 4` will not be able to balance the memory bus: it would require a send at an earlier timestamp to an out of bounds memory address, which the invariant prevents.
Only the starting `mem_ptr` is range checked: since each row will increment `mem_ptr` by `4`, an out
of bounds memory access will occur, causing an imbalance in the memory bus, before `mem_ptr` overflows the field.

On the starting row, the `rem_words` is also read from memory in the form of `rem_words_limbs` limbs.
We also range check the highest limb to be `pointer_max_bits - 24` bits to ensure that calculating `rem_words` by composing the limbs `rem_words_limbs` will not overflow the field. Note that the bound `rem_words < 2^pointer_max_bits` is not tight, since `rem_words` refers to 4-byte words, not bytes.
On each row with `is_buffer = 1`, the `rem_words` is decremented by `1`.

Note: we constrain that when the current instruction ends then `rem_words` is 1. However, we don't constrain that when `rem_words` is 1 then we have to end the current instruction. The only way to exploit this if we to do some multiple of `p` number of additional illegal `is_buffer = 1` rows where `p` is the prime modulus of `F`. However, when doing `p` additional rows we will always reach an illegal `mem_ptr` at some point which prevents this exploit.
Loading