Implement S-mode traps by qbojj · Pull Request #884 · kuznia-rdzeni/coreblocks

qbojj · 2026-04-09T21:11:15Z

This builds up on #878 and implements trap handling.

This PR implements:

interrupt and exception delegation
SRET and blank SFENCE.VMA
tests for S-mode traps

github-actions · 2026-04-17T11:45:22Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.411 (0.000)	0.554 (0.000)	0.345 (0.000)	0.635 (0.000)	0.357 (0.000)	0.289 (0.000)	0.318 (0.000)	0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 17439 (+297)	▲ 4708 (+36)	1380 (0)	1324 (0)	▲ 51 (+1)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 38305 (-4308)	▲ 8538 (+47)	▲ 2948 (+32)	2316 (0)	▲ 37 (+1)

github-actions · 2026-04-17T21:48:59Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.411 (0.000)	0.554 (0.000)	0.345 (0.000)	0.635 (0.000)	0.357 (0.000)	0.289 (0.000)	0.318 (0.000)	0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 17099 (-43)	▲ 4707 (+35)	1380 (0)	1324 (0)	▲ 51 (+0)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 39725 (-2888)	▲ 8537 (+46)	▲ 2948 (+32)	2316 (0)	▼ 36 (-1)

github-actions · 2026-04-17T22:25:26Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.411 (0.000)	0.554 (0.000)	0.345 (0.000)	0.635 (0.000)	0.357 (0.000)	0.289 (0.000)	0.318 (0.000)	0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 17145 (+3)	▲ 4707 (+35)	1380 (0)	1324 (0)	▼ 46 (-4)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 39693 (-2920)	▲ 8537 (+46)	▲ 2948 (+32)	2316 (0)	▼ 36 (-1)

github-actions · 2026-04-17T23:52:29Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.411 (0.000)	0.554 (0.000)	0.345 (0.000)	0.635 (0.000)	0.357 (0.000)	0.289 (0.000)	0.318 (0.000)	0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 17493 (+351)	▲ 4707 (+35)	▼ 1348 (-32)	1324 (0)	▲ 52 (+2)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 41987 (-626)	▲ 8537 (+46)	▲ 2948 (+32)	2316 (0)	▼ 36 (-1)

awariac

Very nice work!

3/4 part of review - some comments and discussions:

awariac · 2026-04-22T23:25:07Z

+            if self.medelegh:
+                m.d.av_comb += edeleg.eq(Cat(self.medeleg.read(m).data, self.medelegh.read(m).data))
+            else:
+                m.d.av_comb += edeleg.eq(self.medeleg.read(m).data)


I think we need some wraper for RV32 to group *h registers together and provide single read.
Out of scope here.

awariac · 2026-04-22T23:32:48Z

+        @def_method(m, self.get_trap_target_priv)
+        def _(cause):


I don't like that its feeding back already computed cause (but only for interrupts), but I don't have ideas how to improve it. I see this is necessary for exceptions, and to be done at last possible point.
Some things I was thinking about - maybe trap_entry should return target priv instead of requiring it? Or maybe this would fit more into ExceptionInformationRegister that we need to call too?
The split about ways of exceptions and interrupts is probelmatic here.

If we don't come up with any better ideas, this could be left as-is.

Also in case of interrupts this is duplicating computation of top interrupt. Could this be unified?

I have redone the interface - now entry returns the privilege. There is still duplication of interrupt target logic - I was wandering if we could just have it saved like interrupt_cause, but I'm not sure if it would not be some sort of TOC-TOU broblem

I'm also a bit worried about TOCTOU issues here.

The only part I was referring to was that we already do selected_pending computation based on m_interrupt_insert and that's equivalent to the code here.
get_interrupt_cause happens at the same time in the retirement as trap_entry and represents the state in the same cycle.

However, I have no idea how to connect that cleanly, and hardware cost is not that large either (one xlenwide mux).
The retirement implementation could change in the future as well.

It's fine to leave it as its now, its much better.

awariac · 2026-04-23T00:01:32Z

                JumpComponent(),
                ExceptionUnitComponent(),
-                PrivilegedUnitComponent(),
+                PrivilegedUnitComponent(supervisor_enable=True),


It's not very convenient; this should be inferred form gen_params

I had a problem with get_decoder_manager not getting gen params

Huh. The mechanism with get_decoder_manager was designed with more local effects in mind (e.g. ALU: parameter added -> new instructions supported -> new extension automatically inferred). I see two ways of avoiding redundancy in config:

Treat this like extensions - declaring supervisor_enable on any PrivilegedUnit automatically makes the whole core support supervisor mode.

Pass gen_params to get_decoder_manager, and therefore allow the set of supported instructions to change after enabling supervisor mode in the configuration.

Don't have a strong opinion on this, maybe I like the second option slightly more.

second one would be more consistent with user_mode.
Maybe it could accept core_config instead of gen_params?
The problem is gen_params's ISA string is determined from FU configs, so that's kind of circular dependency. It's not a problem in this case, but it could create problems for other units if we add this to standard interface.
@tilk what do you think about that?

You are right about the circular depdendency, passing the configuration seems like the best choice here.

the problem is that this object is the part of the configuration...

ok, I think you can add assert to PrivilegedUnitComponent that supervisor_mode == gen_params.supervisor_mode and maybe we will come up with something better later

awariac · 2026-04-23T16:18:35Z

there are also architectural tests for supervisor mode that need to be enabled. you can do it here on in other PR, but its worth checking for correctness

awariac · 2026-04-23T23:40:08Z

There are some macros in test/external/riscof/coreblocks/env/model_test.h that look like they may be needed to implement (copy from other riscof usages for other cores) - if the issue doesn't resolve with some trivial fix.
You can move riscof to separate PR to merge CSR changes sooner.

qbojj · 2026-04-24T18:35:35Z

I think I will leave riscof to another PR

github-actions · 2026-04-24T19:43:08Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.411 (0.000)	0.554 (0.000)	0.344 (0.000)	0.635 (0.000)	0.358 (0.000)	0.289 (0.000)	0.318 (0.000)	0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 19065 (+1935)	▲ 4787 (+35)	1382 (0)	1372 (0)	▲ 48 (+2)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▼ 40725 (-494)	▲ 8533 (+46)	▲ 3320 (+32)	2364 (0)	▲ 38 (+1)

awariac · 2026-04-27T20:40:34Z

                JumpComponent(),
                ExceptionUnitComponent(),
-                PrivilegedUnitComponent(),
+                PrivilegedUnitComponent(supervisor_enable=True),


ok, I think you can add assert to PrivilegedUnitComponent that supervisor_mode == gen_params.supervisor_mode and maybe we will come up with something better later

awariac · 2026-04-27T20:47:26Z

+        @def_method(m, self.get_trap_target_priv)
+        def _(cause):


I'm also a bit worried about TOCTOU issues here.

The only part I was referring to was that we already do selected_pending computation based on m_interrupt_insert and that's equivalent to the code here.
get_interrupt_cause happens at the same time in the retirement as trap_entry and represents the state in the same cycle.

However, I have no idea how to connect that cleanly, and hardware cost is not that large either (one xlenwide mux).
The retirement implementation could change in the future as well.

It's fine to leave it as its now, its much better.

qbojj · 2026-04-29T14:48:26Z

It looks like S-mode changes themselves pass riscof tests #921. The only problem seems to be the Zifencei

qbojj linked an issue Apr 9, 2026 that may be closed by this pull request

Implement supervisor interrupts and traps #864

Closed

tilk added enhancement New feature or request nlnet The work is part of the NLnet grant labels Apr 10, 2026

qbojj force-pushed the feat-s-mode-traps branch 3 times, most recently from f6a1859 to 22dd467 Compare April 13, 2026 19:20

qbojj added 3 commits April 17, 2026 13:14

initial s-mode trap impl

fb5859f

add S-mode trap asm tests

045657b

add test case for wfi

e5d8da4

qbojj force-pushed the feat-s-mode-traps branch from 22dd467 to e5d8da4 Compare April 17, 2026 11:14

fixup after rebase

4b5835b

qbojj added the benchmark Benchmarks should be run for this change label Apr 17, 2026

cleanup

a9a4b65

cleanup decoder hacks

97f3091

qbojj marked this pull request as ready for review April 17, 2026 22:02

cleanup interrupt controller

4bd36fe

remove unneeded assignment

2c54fc3

qbojj removed benchmark Benchmarks should be run for this change labels Apr 17, 2026