Skip to content

Implement S-mode traps#884

Merged
tilk merged 28 commits intokuznia-rdzeni:masterfrom
qbojj:feat-s-mode-traps
Apr 29, 2026
Merged

Implement S-mode traps#884
tilk merged 28 commits intokuznia-rdzeni:masterfrom
qbojj:feat-s-mode-traps

Conversation

@qbojj
Copy link
Copy Markdown
Contributor

@qbojj qbojj commented Apr 9, 2026

This builds up on #878 and implements trap handling.

This PR implements:

  • interrupt and exception delegation
  • SRET and blank SFENCE.VMA
  • tests for S-mode traps

@qbojj qbojj linked an issue Apr 9, 2026 that may be closed by this pull request
@tilk tilk added enhancement New feature or request nlnet The work is part of the NLnet grant labels Apr 10, 2026
@qbojj qbojj force-pushed the feat-s-mode-traps branch 3 times, most recently from f6a1859 to 22dd467 Compare April 13, 2026 19:20
@qbojj qbojj force-pushed the feat-s-mode-traps branch from 22dd467 to e5d8da4 Compare April 17, 2026 11:14
@qbojj qbojj added the benchmark Benchmarks should be run for this change label Apr 17, 2026
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 17439 (+297) ▲ 4708 (+36) 1380 (0) 1324 (0) ▲ 51 (+1)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 38305 (-4308) ▲ 8538 (+47) ▲ 2948 (+32) 2316 (0) ▲ 37 (+1)

@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 17099 (-43) ▲ 4707 (+35) 1380 (0) 1324 (0) ▲ 51 (+0)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 39725 (-2888) ▲ 8537 (+46) ▲ 2948 (+32) 2316 (0) ▼ 36 (-1)

@qbojj qbojj marked this pull request as ready for review April 17, 2026 22:02
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 17145 (+3) ▲ 4707 (+35) 1380 (0) 1324 (0) ▼ 46 (-4)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 39693 (-2920) ▲ 8537 (+46) ▲ 2948 (+32) 2316 (0) ▼ 36 (-1)

@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 17493 (+351) ▲ 4707 (+35) ▼ 1348 (-32) 1324 (0) ▲ 52 (+2)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 41987 (-626) ▲ 8537 (+46) ▲ 2948 (+32) 2316 (0) ▼ 36 (-1)

@qbojj qbojj removed benchmark Benchmarks should be run for this change labels Apr 17, 2026
@github-actions

This comment has been minimized.

@awariac awariac self-requested a review April 18, 2026 01:24
Copy link
Copy Markdown
Member

@awariac awariac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work!

3/4 part of review - some comments and discussions:

Comment thread coreblocks/priv/traps/interrupt_controller.py Outdated
Comment thread coreblocks/priv/traps/interrupt_controller.py Outdated
Comment thread coreblocks/priv/traps/interrupt_controller.py Outdated
Comment thread coreblocks/priv/traps/interrupt_controller.py
Comment thread coreblocks/priv/traps/interrupt_controller.py Outdated
Comment on lines +161 to +164
if self.medelegh:
m.d.av_comb += edeleg.eq(Cat(self.medeleg.read(m).data, self.medelegh.read(m).data))
else:
m.d.av_comb += edeleg.eq(self.medeleg.read(m).data)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some wraper for RV32 to group *h registers together and provide single read.
Out of scope here.

Comment thread coreblocks/priv/traps/interrupt_controller.py Outdated
Comment on lines +148 to +149
@def_method(m, self.get_trap_target_priv)
def _(cause):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that its feeding back already computed cause (but only for interrupts), but I don't have ideas how to improve it. I see this is necessary for exceptions, and to be done at last possible point.
Some things I was thinking about - maybe trap_entry should return target priv instead of requiring it? Or maybe this would fit more into ExceptionInformationRegister that we need to call too?
The split about ways of exceptions and interrupts is probelmatic here.

If we don't come up with any better ideas, this could be left as-is.

Also in case of interrupts this is duplicating computation of top interrupt. Could this be unified?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have redone the interface - now entry returns the privilege. There is still duplication of interrupt target logic - I was wandering if we could just have it saved like interrupt_cause, but I'm not sure if it would not be some sort of TOC-TOU broblem

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit worried about TOCTOU issues here.

The only part I was referring to was that we already do selected_pending computation based on m_interrupt_insert and that's equivalent to the code here.
get_interrupt_cause happens at the same time in the retirement as trap_entry and represents the state in the same cycle.

However, I have no idea how to connect that cleanly, and hardware cost is not that large either (one xlenwide mux).
The retirement implementation could change in the future as well.

It's fine to leave it as its now, its much better.

JumpComponent(),
ExceptionUnitComponent(),
PrivilegedUnitComponent(),
PrivilegedUnitComponent(supervisor_enable=True),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very convenient; this should be inferred form gen_params

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a problem with get_decoder_manager not getting gen params

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. The mechanism with get_decoder_manager was designed with more local effects in mind (e.g. ALU: parameter added -> new instructions supported -> new extension automatically inferred). I see two ways of avoiding redundancy in config:

  • Treat this like extensions - declaring supervisor_enable on any PrivilegedUnit automatically makes the whole core support supervisor mode.
  • Pass gen_params to get_decoder_manager, and therefore allow the set of supported instructions to change after enabling supervisor mode in the configuration.

Don't have a strong opinion on this, maybe I like the second option slightly more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second one would be more consistent with user_mode.
Maybe it could accept core_config instead of gen_params?
The problem is gen_params's ISA string is determined from FU configs, so that's kind of circular dependency. It's not a problem in this case, but it could create problems for other units if we add this to standard interface.
@tilk what do you think about that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the circular depdendency, passing the configuration seems like the best choice here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is that this object is the part of the configuration...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think you can add assert to PrivilegedUnitComponent that supervisor_mode == gen_params.supervisor_mode and maybe we will come up with something better later

Comment thread coreblocks/interface/layouts.py
@awariac
Copy link
Copy Markdown
Member

awariac commented Apr 23, 2026

there are also architectural tests for supervisor mode that need to be enabled. you can do it here on in other PR, but its worth checking for correctness

@qbojj qbojj added the benchmark Benchmarks should be run for this change label Apr 23, 2026
@github-actions

This comment has been minimized.

@awariac
Copy link
Copy Markdown
Member

awariac commented Apr 23, 2026

There are some macros in test/external/riscof/coreblocks/env/model_test.h that look like they may be needed to implement (copy from other riscof usages for other cores) - if the issue doesn't resolve with some trivial fix.
You can move riscof to separate PR to merge CSR changes sooner.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@qbojj
Copy link
Copy Markdown
Contributor Author

qbojj commented Apr 24, 2026

I think I will leave riscof to another PR

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@qbojj qbojj removed the benchmark Benchmarks should be run for this change label Apr 24, 2026
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.344 (0.000) 0.635 (0.000) 0.358 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 19065 (+1935) ▲ 4787 (+35) 1382 (0) 1372 (0) ▲ 48 (+2)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 40725 (-494) ▲ 8533 (+46) ▲ 3320 (+32) 2364 (0) ▲ 38 (+1)

@qbojj qbojj mentioned this pull request Apr 26, 2026
3 tasks
JumpComponent(),
ExceptionUnitComponent(),
PrivilegedUnitComponent(),
PrivilegedUnitComponent(supervisor_enable=True),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think you can add assert to PrivilegedUnitComponent that supervisor_mode == gen_params.supervisor_mode and maybe we will come up with something better later

Comment on lines +148 to +149
@def_method(m, self.get_trap_target_priv)
def _(cause):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a bit worried about TOCTOU issues here.

The only part I was referring to was that we already do selected_pending computation based on m_interrupt_insert and that's equivalent to the code here.
get_interrupt_cause happens at the same time in the retirement as trap_entry and represents the state in the same cycle.

However, I have no idea how to connect that cleanly, and hardware cost is not that large either (one xlenwide mux).
The retirement implementation could change in the future as well.

It's fine to leave it as its now, its much better.

@qbojj
Copy link
Copy Markdown
Contributor Author

qbojj commented Apr 29, 2026

It looks like S-mode changes themselves pass riscof tests #921. The only problem seems to be the Zifencei

@tilk tilk merged commit 26c3a2f into kuznia-rdzeni:master Apr 29, 2026
13 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request nlnet The work is part of the NLnet grant

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement supervisor interrupts and traps

3 participants