-
Notifications
You must be signed in to change notification settings - Fork 135
Fix system emulation reboot #638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 issues found across 10 files
Prompt for AI agents (all 7 issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/syscall.c">
<violation number="1" location="src/syscall.c:541">
P2: Passing `NULL` to `%s` format specifier is undefined behavior. If `a0` or `a1` contains an unrecognized value, the helper functions return `NULL` which is then passed to `rv_log_info`. Consider returning a fallback string like `"unknown"` instead of `NULL`.</violation>
<violation number="2" location="src/syscall.c:554">
P1: Missing error handling for unknown reset types. If `a0` is not SHUTDOWN, COLD_REBOOT, or WARM_REBOOT, the code silently falls through without setting return values. Add an `else` clause to return `SBI_ERR_NOT_SUPPORTED`.</violation>
</file>
<file name="src/riscv.c">
<violation number="1" location="src/riscv.c:1070">
P2: Missing validation for empty vblk device path. The original code checked `if (!vblk_device_str[0])` to prevent empty disk paths. Without this check, an empty path would result in `vblk_device` being NULL after strtok processing, potentially causing issues in `virtio_blk_init()`.</violation>
<violation number="2" location="src/riscv.c:1082">
P2: Regression: Tilde (`~`) expansion for vblk device paths was removed. Users who specify paths like `~/disk.img` will now have those paths treated literally instead of expanding to their home directory, breaking existing configurations.</violation>
<violation number="3" location="src/riscv.c:1121">
P1: Memory leak on reboot: memory pools `block_mp` and `block_ir_mp` are created unconditionally without checking if they already exist. Unlike other resources that have `/* check for reboot */` guards, these pools are always recreated, leaking the previous allocations on reboot.</violation>
<violation number="4" location="src/riscv.c:1133">
P1: Memory leak on reboot (JIT mode): `jit_state` and `block_cache` are created unconditionally without reboot checks. This could cause memory leaks on reboot and may be related to the JIT segfault mentioned in the PR description.</violation>
</file>
<file name="src/riscv.h">
<violation number="1" location="src/riscv.h:285">
P2: These SBI reset type/reason macros use very generic names that may collide with system headers or other code. Consider using the established `SBI_RST_` prefix for consistency with existing naming conventions (e.g., `SBI_RST_TYPE_SHUTDOWN`, `SBI_RST_TYPE_COLD_REBOOT`, `SBI_RST_REASON_NONE`).</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks
Details
| Benchmark suite | Current: 3641924 | Previous: 674f9d9 | Ratio |
|---|---|---|---|
Dhrystone |
1619.667 DMIPS |
1626 DMIPS |
1.00 |
CoreMark |
949.751 iterations/sec |
971.664 iterations/sec |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
ba3011f to
6566538
Compare
a6a97c8 to
313a542
Compare
313a542 to
0890602
Compare
|
eea79da to
3623c9d
Compare
3623c9d to
781797a
Compare
45c1ac9 to
2dc3457
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6 issues found across 16 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/syscall.c">
<violation number="1" location="src/syscall.c:499">
P2: Unsupported SBI reset types fall through without setting rv_reg_a0/rv_reg_a1, so the SBI call returns a stale status even though no reset occurred. Explicitly return an error (e.g., SBI_ERR_INVALID_PARAM) when the type is not handled.</violation>
</file>
<file name=".ci/reboot.sh">
<violation number="1" location=".ci/reboot.sh:11">
P1: `TEST_OPTIONS` should contain only executable flags. Appending display labels like ` (cold reboot)` / ` (warm reboot)` causes invalid arguments to be passed to rv32emu, so the tests never run. Keep labels in a separate array or add them only when printing, not when building the command line.</violation>
</file>
<file name=".ci/virtio-blk.sh">
<violation number="1" location=".ci/virtio-blk.sh:243">
P2: Host-side virtio-blk failures are ignored because `RET` is incremented before `ret` can be set to 4 during the 7z verification, so the script exits successfully even when the file check fails.</violation>
</file>
<file name="src/devices/plic.c">
<violation number="1" location="src/devices/plic.c:95">
P2: `plic_reset()` zeroes the struct and wipes the `rv` pointer, so the PLIC loses its connection to the hart after a reboot. Preserve `plic->rv` (or only clear the register fields) when resetting.</violation>
</file>
<file name="src/riscv.c">
<violation number="1" location="src/riscv.c:1257">
P0: Missing `fuse_mp` memory pool creation. The `fuse_mp` is used for macro-operation fusion (in emulate.c) but is never allocated, causing NULL pointer dereference when fusion is enabled.</violation>
<violation number="2" location="src/riscv.c:1305">
P0: Missing `pthread_cond_init` for `wait_queue_cond`. The condition variable is used in `t2c_runloop` but never initialized, causing undefined behavior with T2C enabled.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| /* prepare wait queue. */ | ||
| pthread_mutex_init(&rv->wait_queue_lock, NULL); | ||
| pthread_mutex_init(&rv->cache_lock, NULL); | ||
| INIT_LIST_HEAD(&rv->wait_queue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: Missing pthread_cond_init for wait_queue_cond. The condition variable is used in t2c_runloop but never initialized, causing undefined behavior with T2C enabled.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 1305:
<comment>Missing `pthread_cond_init` for `wait_queue_cond`. The condition variable is used in `t2c_runloop` but never initialized, causing undefined behavior with T2C enabled.</comment>
<file context>
@@ -589,645 +589,743 @@ riscv_t *rv_create(riscv_user_t rv_attr)
+ /* prepare wait queue. */
+ pthread_mutex_init(&rv->wait_queue_lock, NULL);
+ pthread_mutex_init(&rv->cache_lock, NULL);
+ INIT_LIST_HEAD(&rv->wait_queue);
+ /* activate the background compilation thread. */
+ pthread_create(&t2c_thread, NULL, t2c_runloop, rv);
</file context>
| if (!rv->block_ir_mp) { /* check for reboot */ | ||
| rv->block_ir_mp = mpool_create( | ||
| sizeof(rv_insn_t) << BLOCK_IR_MAP_CAPACITY_BITS, sizeof(rv_insn_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: Missing fuse_mp memory pool creation. The fuse_mp is used for macro-operation fusion (in emulate.c) but is never allocated, causing NULL pointer dereference when fusion is enabled.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 1257:
<comment>Missing `fuse_mp` memory pool creation. The `fuse_mp` is used for macro-operation fusion (in emulate.c) but is never allocated, causing NULL pointer dereference when fusion is enabled.</comment>
<file context>
@@ -589,645 +589,743 @@ riscv_t *rv_create(riscv_user_t rv_attr)
+ rv->block_mp = mpool_create(sizeof(block_t) << BLOCK_MAP_CAPACITY_BITS,
+ sizeof(block_t));
+ }
+ if (!rv->block_ir_mp) { /* check for reboot */
+ rv->block_ir_mp = mpool_create(
+ sizeof(rv_insn_t) << BLOCK_IR_MAP_CAPACITY_BITS, sizeof(rv_insn_t));
</file context>
| if (!rv->block_ir_mp) { /* check for reboot */ | |
| rv->block_ir_mp = mpool_create( | |
| sizeof(rv_insn_t) << BLOCK_IR_MAP_CAPACITY_BITS, sizeof(rv_insn_t)); | |
| if (!rv->block_ir_mp) { /* check for reboot */ | |
| rv->block_ir_mp = mpool_create( | |
| sizeof(rv_insn_t) << BLOCK_IR_MAP_CAPACITY_BITS, sizeof(rv_insn_t)); | |
| } | |
| /* Fuse pool: fixed-size slots for macro-op fusion arrays. */ | |
| if (!rv->fuse_mp) { /* check for reboot */ | |
| rv->fuse_mp = mpool_create(FUSE_SLOT_SIZE << BLOCK_IR_MAP_CAPACITY_BITS, | |
| FUSE_SLOT_SIZE); | |
| } |
|
|
||
| # Reboot Tests | ||
| # cold reboot | ||
| TEST_OPTIONS=("${OPTS_BASE} (cold reboot)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: TEST_OPTIONS should contain only executable flags. Appending display labels like (cold reboot) / (warm reboot) causes invalid arguments to be passed to rv32emu, so the tests never run. Keep labels in a separate array or add them only when printing, not when building the command line.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .ci/reboot.sh, line 11:
<comment>`TEST_OPTIONS` should contain only executable flags. Appending display labels like ` (cold reboot)` / ` (warm reboot)` causes invalid arguments to be passed to rv32emu, so the tests never run. Keep labels in a separate array or add them only when printing, not when building the command line.</comment>
<file context>
@@ -0,0 +1,51 @@
+
+# Reboot Tests
+# cold reboot
+TEST_OPTIONS=("${OPTS_BASE} (cold reboot)")
+EXPECT_CMDS=('
+ expect "buildroot login:" { send "root\n" } timeout { exit 1 }
</file context>
| DONE | ||
|
|
||
| ret=$? | ||
| RET=$((${RET} + ${ret})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: Host-side virtio-blk failures are ignored because RET is incremented before ret can be set to 4 during the 7z verification, so the script exits successfully even when the file check fails.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .ci/virtio-blk.sh, line 243:
<comment>Host-side virtio-blk failures are ignored because `RET` is incremented before `ret` can be set to 4 during the 7z verification, so the script exits successfully even when the file check fails.</comment>
<file context>
@@ -0,0 +1,263 @@
+ DONE
+
+ ret=$?
+ RET=$((${RET} + ${ret}))
+ cleanup
+
</file context>
| memset(plic, 0, sizeof(plic_t)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: plic_reset() zeroes the struct and wipes the rv pointer, so the PLIC loses its connection to the hart after a reboot. Preserve plic->rv (or only clear the register fields) when resetting.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/devices/plic.c, line 95:
<comment>`plic_reset()` zeroes the struct and wipes the `rv` pointer, so the PLIC loses its connection to the hart after a reboot. Preserve `plic->rv` (or only clear the register fields) when resetting.</comment>
<file context>
@@ -89,3 +89,8 @@ void plic_delete(plic_t *plic)
+
+void plic_reset(plic_t *plic)
+{
+ memset(plic, 0, sizeof(plic_t));
+}
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 issues found across 16 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/syscall.c">
<violation number="1" location="src/syscall.c:531">
P2: The `SBI_RST_SYSTEM_RESET` case never sets an error code when the reset type isn’t handled, so unsupported reset types (including warm reboot on builds where it’s disabled) return garbage instead of `SBI_ERR_NOT_SUPPORTED`. Add a final `else` to set the proper error before breaking.</violation>
</file>
<file name=".ci/reboot.sh">
<violation number="1" location=".ci/reboot.sh:26">
P1: The warm reboot expect script embeds `'warm'` inside a single-quoted string, which breaks the shell literal and makes the script fail to parse. Remove or escape the inner quotes so the array element is valid shell syntax.</violation>
</file>
<file name="src/riscv.c">
<violation number="1" location="src/riscv.c:901">
P1: Missing `pthread_cond_signal` to wake up the T2C thread before `pthread_join`. The thread may be blocked on `pthread_cond_wait` and will deadlock. Also missing `pthread_cond_destroy` for proper cleanup.</violation>
<violation number="2" location="src/riscv.c:1305">
P1: Missing `pthread_cond_init(&rv->wait_queue_cond, NULL)` before starting the T2C thread. The condition variable is used in `t2c_runloop` with `pthread_cond_wait`, which requires initialization.</violation>
</file>
<file name=".ci/virtio-blk.sh">
<violation number="1" location=".ci/virtio-blk.sh:243">
P2: RET is updated before host verification can set `ret=4`, so missing emu.txt never propagates to the final exit status. Move the RET update after all checks that might change `ret`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| #else | ||
| #if RV32_HAS(T2C) | ||
| rv->quit = true; | ||
| pthread_join(t2c_thread, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Missing pthread_cond_signal to wake up the T2C thread before pthread_join. The thread may be blocked on pthread_cond_wait and will deadlock. Also missing pthread_cond_destroy for proper cleanup.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 901:
<comment>Missing `pthread_cond_signal` to wake up the T2C thread before `pthread_join`. The thread may be blocked on `pthread_cond_wait` and will deadlock. Also missing `pthread_cond_destroy` for proper cleanup.</comment>
<file context>
@@ -589,645 +589,743 @@ riscv_t *rv_create(riscv_user_t rv_attr)
+#else
+#if RV32_HAS(T2C)
+ rv->quit = true;
+ pthread_join(t2c_thread, NULL);
+ pthread_mutex_destroy(&rv->wait_queue_lock);
+ pthread_mutex_destroy(&rv->cache_lock);
</file context>
| /* prepare wait queue. */ | ||
| pthread_mutex_init(&rv->wait_queue_lock, NULL); | ||
| pthread_mutex_init(&rv->cache_lock, NULL); | ||
| INIT_LIST_HEAD(&rv->wait_queue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Missing pthread_cond_init(&rv->wait_queue_cond, NULL) before starting the T2C thread. The condition variable is used in t2c_runloop with pthread_cond_wait, which requires initialization.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/riscv.c, line 1305:
<comment>Missing `pthread_cond_init(&rv->wait_queue_cond, NULL)` before starting the T2C thread. The condition variable is used in `t2c_runloop` with `pthread_cond_wait`, which requires initialization.</comment>
<file context>
@@ -589,645 +589,743 @@ riscv_t *rv_create(riscv_user_t rv_attr)
+ /* prepare wait queue. */
+ pthread_mutex_init(&rv->wait_queue_lock, NULL);
+ pthread_mutex_init(&rv->cache_lock, NULL);
+ INIT_LIST_HEAD(&rv->wait_queue);
+ /* activate the background compilation thread. */
+ pthread_create(&t2c_thread, NULL, t2c_runloop, rv);
</file context>
| INIT_LIST_HEAD(&rv->wait_queue); | |
| pthread_cond_init(&rv->wait_queue_cond, NULL); | |
| INIT_LIST_HEAD(&rv->wait_queue); |
| DONE | ||
|
|
||
| ret=$? | ||
| RET=$((${RET} + ${ret})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: RET is updated before host verification can set ret=4, so missing emu.txt never propagates to the final exit status. Move the RET update after all checks that might change ret.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .ci/virtio-blk.sh, line 243:
<comment>RET is updated before host verification can set `ret=4`, so missing emu.txt never propagates to the final exit status. Move the RET update after all checks that might change `ret`.</comment>
<file context>
@@ -0,0 +1,263 @@
+ DONE
+
+ ret=$?
+ RET=$((${RET} + ${ret}))
+ cleanup
+
</file context>
2dc3457 to
e9491b5
Compare
jserv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase latest master branch and resolve conflicts.
At 7d13e82, the SBI RST extension did not distinguish between reboot and shutdown type. When a userspace reboot command was issued, the hart was incorrectly halted as if it were a shutdown. These changes fix the issue for both modes (interpreter and JIT) by properly detecting and handling two reboot types: cold and warm. Each type has its own handler: rv_cold_reboot() and rv_warm_reboot(). Since the initial system power-on is treated as a cold reboot, the initialization code has been refactored to share logic with the reboot path, adhering to the DRY principle. Key changes: 1. Rename rv_reset() to rv_cold_reboot() - Full system reset including processor registers, block $, TLB, memory and all peripherals. The first power-on is treated as a cold reboot. 2. Introduce rv_warm_reboot() - Faster reboot that only resets processor registers, block $, TLB and memory, skipping peripheral reinit. Can be triggered via echo "warm" > /sys/kernel/reboot/mode in guestOS. 3. Introduce rv_reset_hart() - Static helper function to reset only hart state (GPRs, CSRs, PC, block $, TLB), shared by both reboot modes. 4. Refactor boot image loading - Extract load_boot_images() helper to load kernel, DTB, and initrd, reducing code duplication between both reboot paths. 5. Add reboot-safe resource management - Add "check for reboot" comments throughout initialization to reuse already-allocated resources (memory, fd_map, PLIC, UART, vblk, block_map, etc) instead of re-allocating. For JIT state resources, might free and re-allocate (e.g., calling jit_state_exit()) to reuse the existing APIs. 6. Use calloc for vblk/disk arrays - Changed from malloc to calloc so pointers are initialized to NULL, simplifying reboot checks. 7. Use setjmp/longjmp for clean reboot - Reboot rewrites all registers, causing call stack values to become stale. setjmp in rv_step() establishes a return point, longjmp after reboot provides a clean call stack. 8. Add plic_reset() and u8250_reset() - New device reset functions to reinitialize state without free/realloc. 9. Add sbi_rst_type_str() and sbi_rst_reason_str() - Helper functions for human-readable SBI reset type/reason in dmesg.
Refactor multiple boot linux test suite into separate files: - .ci/rtc.sh - .ci/reboot.sh - .ci/virtio-blk.sh and source common variable from .ci/common.sh. The .ci/boot-linux.sh becomes a driver to drive these separate tests and accumulate the exit code from each of them. The refactor simplifies to expand the linux test suite in the future by just adding a new file in .ci/ directory and source it into the driver code .ci/boot-linux.sh.
wasm-ld: error: --shared-memory is disallowed by build/emulate.o because it was not compiled with 'atomics' or 'bulk-memory' features.
e9491b5 to
3641924
Compare
Description
At 7d13e82, the SBI RST extension did not distinguish between reboot and shutdown type. When a userspace reboot command was issued, the hart was incorrectly halted as if it were a shutdown.
These changes fix the issue for both modes (interpreter and JIT) by properly detecting and handling two reboot types: cold and warm. Each type has its own handler: rv_cold_reboot() and rv_warm_reboot(). Since the initial system power-on is treated as a cold reboot, the initialization code has been refactored to share logic with the reboot path, adhering to the DRY principle. Additionally, device reset helper functions (plic_reset(), u8250_reset()) are introduced to support peripheral reinitialization during reboot.
Key changes:
Rename rv_reset() to rv_cold_reboot() - Full system reset including processor, memory, and all peripherals. The initial power-on is treated as a cold reboot.
Introduce rv_warm_reboot() - Faster reboot that only resets processor and memory, skipping peripheral reinitialization. Can be triggered via echo "warm" > /sys/kernel/reboot/mode in guestOS.
Refactor boot image loading - Extract load_boot_images() helper to load kernel, DTB, and initrd, reducing code duplication between cold and warm reboot paths.
Introduce rv_reset_hart() - Static helper function to reset only hart state (GPRs, CSRs, PC), shared by both reboot modes.
Add reboot-safe resource management - Add "check for reboot" comments throughout initialization to reuse already-allocated resources (memory, fd_map, PLIC, UART, vblk, block_map, etc) instead of re-allocating. For JIT state resources, might free and re-allocate (e.g., calling jit_state_exit()) to reuse the existing APIs.
Use calloc for vblk/disk arrays - Changed from malloc to calloc so pointers are initialized to NULL, simplifying reboot checks.
Use setjmp/longjmp for clean reboot - Reboot rewrites all registers, causing call stack values to become stale. setjmp in rv_step() establishes a return point, longjmp after reboot provides a clean call stack.
Add plic_reset() and u8250_reset() - New device reset functions to reinitialize state without free/realloc.
Add sbi_rst_type_str() and sbi_rst_reason_str() - Helper functions for human-readable SBI reset type/reason in dmesg.
Testing
# rebootExpectation
The guestOS is rebooted and the behavior is similar as the demo video.
Demo
reboot-demo.mp4
The guestOS userspace applications runs smoothly even after reboot. This demo covers 4 phases:
power-on,cold reboot,warm reboot,poweroff.Summary by cubic
Fixes system emulation reboot by correctly handling SBI reset types and adding cold and warm reboot paths. Reboots no longer halt the hart and now return cleanly to the run loop, with CI tests covering both paths.
New Features
Refactors
Written for commit 3641924. Summary will update on new commits.