Skip to content

implement S mode CSRs#878

Merged
tilk merged 24 commits intokuznia-rdzeni:masterfrom
qbojj:feat-s-mode-csrs
Apr 17, 2026
Merged

implement S mode CSRs#878
tilk merged 24 commits intokuznia-rdzeni:masterfrom
qbojj:feat-s-mode-csrs

Conversation

@qbojj
Copy link
Copy Markdown
Contributor

@qbojj qbojj commented Mar 30, 2026

closes #863

@qbojj qbojj added the enhancement New feature or request label Mar 30, 2026
@qbojj qbojj marked this pull request as ready for review April 1, 2026 23:40
@qbojj qbojj force-pushed the feat-s-mode-csrs branch from 55627f6 to eb07c74 Compare April 2, 2026 12:56
@qbojj qbojj marked this pull request as draft April 2, 2026 13:40
Comment thread coreblocks/priv/csr/shadow.py Outdated
@qbojj qbojj added benchmark Benchmarks should be run for this change and removed benchmark Benchmarks should be run for this change labels Apr 3, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 4, 2026

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.420 (0.000) 0.554 (0.000) 0.349 (0.000) 0.635 (0.000) 0.358 (0.000) 0.288 (0.000) 0.321 (0.000) 0.421 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 16088 (+1072) ▲ 4340 (+124) ▼ 1346 (-64) 1688 (0) ▼ 49 (-2)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 38074 (+3576) ▲ 7928 (+125) ▼ 2938 (-64) 2680 (0) ▲ 40 (+1)

@qbojj qbojj removed the benchmark Benchmarks should be run for this change label Apr 4, 2026
@qbojj qbojj force-pushed the feat-s-mode-csrs branch from b2b2c8e to 8afc6d0 Compare April 5, 2026 09:34
@qbojj qbojj force-pushed the feat-s-mode-csrs branch from 8afc6d0 to 06fae44 Compare April 5, 2026 09:35
@qbojj qbojj marked this pull request as ready for review April 5, 2026 09:36
@tilk tilk requested a review from awariac April 5, 2026 11:10
Comment thread coreblocks/priv/traps/interrupt_controller.py Outdated
@awariac awariac requested review from awariac and removed request for awariac April 7, 2026 15:21
Comment thread coreblocks/priv/traps/interrupt_controller.py Outdated
Comment thread coreblocks/priv/csr/csr_instances.py Outdated
Comment thread coreblocks/priv/csr/csr_instances.py
Comment thread coreblocks/params/vmem_params.py Outdated
Comment thread coreblocks/func_blocks/csr/csr.py Outdated
Comment thread coreblocks/priv/csr/shadow.py Outdated
Comment thread coreblocks/priv/traps/interrupt_controller.py
Comment thread coreblocks/priv/csr/shadow.py Outdated
Comment thread coreblocks/priv/csr/csr_instances.py Outdated
Comment thread coreblocks/priv/csr/csr_instances.py Outdated
@qbojj qbojj added benchmark Benchmarks should be run for this change and removed benchmark Benchmarks should be run for this change labels Apr 13, 2026
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 17157 (+2645) ▲ 4672 (+109) ▼ 1380 (-32) 1324 (0) ▼ 48 (-2)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 40660 (+4023) ▲ 8491 (+110) ▼ 2948 (-64) 2316 (0) ▼ 38 (-3)

@qbojj qbojj removed the benchmark Benchmarks should be run for this change label Apr 13, 2026
Copy link
Copy Markdown
Member

@tilk tilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style-related comments.

Comment thread coreblocks/arch/isa_consts.py Outdated
Comment thread coreblocks/arch/isa_consts.py Outdated
Comment thread coreblocks/arch/csr_address.py Outdated
Comment thread coreblocks/func_blocks/csr/csr.py
Comment thread coreblocks/params/configurations.py Outdated
Comment thread coreblocks/params/vmem_params.py Outdated
Comment thread coreblocks/params/vmem_params.py Outdated
Comment thread coreblocks/params/configurations.py Outdated
@tilk tilk added the benchmark Benchmarks should be run for this change label Apr 15, 2026
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 16803 (+2357) ▲ 4672 (+109) ▼ 1348 (-64) 1324 (0) ▲ 50 (+2)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 41036 (+3496) ▲ 8491 (+110) ▼ 2948 (-64) 2316 (0) ▼ 37 (-5)

@tilk
Copy link
Copy Markdown
Member

tilk commented Apr 15, 2026

Fmax seems worrying, but the critical path seems unrelated.

@awariac awariac added benchmark Benchmarks should be run for this change and removed benchmark Benchmarks should be run for this change labels Apr 15, 2026
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 16996 (+2550) ▲ 4672 (+109) ▼ 1348 (-64) 1324 (0) ▲ 51 (+2)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 38191 (+651) ▲ 8491 (+110) ▼ 2948 (-64) 2316 (0) ▼ 35 (-7)

@awariac
Copy link
Copy Markdown
Member

awariac commented Apr 15, 2026

it can be verified with other toolchains or/and we should add option for yosys to print more top worst paths (it might hide behind the optimization target).

@tilk
Copy link
Copy Markdown
Member

tilk commented Apr 16, 2026

it can be verified with other toolchains or/and we should add option for yosys to print more top worst paths

This seems like the best course of action.

Still, I don't want this to be a blocker for further S-mode work, and I'm willing to merge now and worry later. What do you think?

@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.411 (0.000) 0.554 (0.000) 0.345 (0.000) 0.635 (0.000) 0.357 (0.000) 0.289 (0.000) 0.318 (0.000) 0.424 (0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 17413 (+2959) ▲ 4672 (+109) ▼ 1348 (-96) 1324 (0) ▼ 46 (-4)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 39410 (+1677) ▲ 8491 (+110) ▼ 2948 (-64) 2316 (0) ▼ 37 (-5)

@qbojj
Copy link
Copy Markdown
Contributor Author

qbojj commented Apr 16, 2026

It seems like my comment during the meeting was not correct:

  • current critical path (~27 ns) is entirely inside of the frontend, not even touching CSRUnit
  • CSRUnit is in the -> sync domain (~25 ns) is: bus_master_data_adapter -> atomic wrapper -> announcement -> CSRUnit_update0_* -> reservation station

@awariac
Copy link
Copy Markdown
Member

awariac commented Apr 16, 2026

CSRUnit_update0_* - looks like its also a RS thing (internal in CSRUnit, no connection with CSR processing)

@qbojj
Copy link
Copy Markdown
Contributor Author

qbojj commented Apr 16, 2026

I have made a version that would not have the condition workaround (qbojj#1), the results are not far from what is seen here.

The critical path there also doesn't touch CSRs - it goes around RS and ALU, so I have no idea what caused the loss of FMax.

Maybe the resource usage started to impact the FMax due to contention?

@awariac
Copy link
Copy Markdown
Member

awariac commented Apr 16, 2026

mayhaps

but the increase is not that large, and in runs on this PR it differs a lot compared to that - ~3k change, but Fmax doesn't seem to correlate with that.

Utilzization now:

 Info: 	          TRELLIS_FF:    8491/  83640    10%
Info: 	        TRELLIS_COMB:   46336/  83640    55%

Due to previous comments, I would merge
and we can try later with larger FPGA, comparing this commit to previous one.
But this need to be analyzed more in-depth soon.

@tilk
Copy link
Copy Markdown
Member

tilk commented Apr 17, 2026

Merge now and worry later, then.

@tilk tilk merged commit 78c1585 into kuznia-rdzeni:master Apr 17, 2026
14 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 17, 2026
@awariac
Copy link
Copy Markdown
Member

awariac commented Apr 18, 2026

wait, did it disappear on master? https://kuznia-rdzeni.github.io/coreblocks/dev/benchmark/

maybe we forgot how bad it is on full config and this was not an issue. Device utilization may be the case.
we need to observe next commits to see if it still fits in noise floor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Benchmarks should be run for this change enhancement New feature or request nlnet The work is part of the NLnet grant

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement supervisor CSRs Add CSR aliasing support

3 participants