Replace the IR tree-walking evaluator with a bytecode VM.#128
Replace the IR tree-walking evaluator with a bytecode VM.#128philipaconrad merged 4 commits intoopen-policy-agent:mainfrom
Conversation
|
ℹ️ Self-assigning for review. |
49ff680 to
f740244
Compare
philipaconrad
left a comment
There was a problem hiding this comment.
This is a massive PR, so I focused most of my review efforts on the join points and interfaces between parts. I read closely through the major conversion logic, and skimmed only lightly over most of the tests. I assume that the 30+ instructions and the stack management are all ported over correctly, or basic tests we already had (like the compliance tests) simply would not pass.
| return buffer | ||
| } | ||
|
|
||
| /// Get the current bytecode buffer as Data (allocates a copy; use for serialisation only) |
There was a problem hiding this comment.
[comment]: I appreciate the usage hints here!
| /// Decode operand without bounds checking (for validated bytecode, little-endian) | ||
| /// SAFETY: Caller must ensure offset is valid and bytes contain enough data | ||
| @inline(__always) | ||
| public static func decodeUnchecked(from bytes: ContiguousArray<UInt8>, at offset: Int) -> (operand: EncodedOperand, size: Int) { |
There was a problem hiding this comment.
[comment]: This looks like a good use of the @inline(__always) marker.
| /// CallKey is a key for memoizing a bytecode user-function (rule) call. | ||
| /// Arguments are captured as raw-encoded operand values rather than resolved RegoValues, | ||
| /// as hashing resolved values is expensive. This relies on the invariant that the plan | ||
| /// will not modify a local after it has been initially set. | ||
| /// Only 2-arg calls (OPA rules: input + data) are memoized. | ||
| struct CallKey: Hashable { |
There was a problem hiding this comment.
[question]: How is that invariant preserved when dealing with "in-place modifying" instruction types, like ArrayAppendStmt? Or is this only for Call*Stmt instructions?
There was a problem hiding this comment.
Indeed, it's only for the call statements. This same assumption was actually made by the replaced code (see comments of InvocationKey for similar content).
philipaconrad
left a comment
There was a problem hiding this comment.
Marking Approve here preemptively, as I didn't see anything wrong or worth delaying the PR further over. I'd love an answer to my earlier review question (just to make sure I'm understanding that part correctly), but it's not a blocker.
Thanks again @koponen for your hard work on bringing this big optimization to life! 😄
Introduce a Bytecode package that compiles IR into a compact instruction stream and executes it in a tight PC loop, replacing the recursive tree-walking IR evaluator. Bytecode is more compact than the IR, more cache-friendly, and eliminates per-evaluation pointer chasing. The tree-walking evaluator copies Statement enum associated values on every dispatch; the VM decodes operands directly from the byte stream instead. The IR-to-bytecode converter runs once at load time. Compact opcode variants pack their operand into the 24-bit header length field, saving a payload word for the most frequent single-operand statements. Validation bounds-checks all string/number/function table references before execution so the VM hot loop can skip those checks entirely. The bytecode VM is up to 25% faster in benchmarks, with the biggest gains on iteration and call-heavy workloads. Signed-off-by: Teemu Koponen <tkoponen@apple.com>
…ime. Signed-off-by: Teemu Koponen <tkoponen@apple.com>
Signed-off-by: Philip Conrad <philip_conrad@apple.com>
Signed-off-by: Philip Conrad <philip_conrad@apple.com>
0b17d02 to
cb8d450
Compare
|
ℹ️ Rebasing to catch up with |
Introduce a Bytecode package that compiles IR into a compact instruction stream and executes it in a tight PC loop, replacing the recursive tree-walking IR evaluator.
Bytecode is more compact than the IR, more cache-friendly, and eliminates per-evaluation pointer chasing. The tree-walking evaluator copies Statement enum associated values on every dispatch; the VM decodes operands directly from the byte stream instead.
The IR-to-bytecode converter runs once at load time. Compact opcode variants pack their operand into the 24-bit header length field, saving a payload word for the most frequent single-operand statements. Validation bounds-checks all string/number/function table references before execution so the VM hot loop can skip those checks entirely.
The bytecode VM is up to 25% faster in benchmarks, with the biggest gains on iteration and call-heavy workloads.