Checkpointing part 2: final integration by awariac · Pull Request #846 · kuznia-rdzeni/coreblocks

awariac · 2025-10-20T20:06:44Z

Checkpointing is finally here!!! 🎉 (it took a bit of work and debug)

Integrates rollback-on-branch-misprediction flow and use of instruction tags into the core.
Async interrupts and other exceptions are still handled by flushing the old way.

TODO:
fix other unit tests, verify linux boot, verify benchmarks

github-actions · 2025-10-24T22:19:26Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
0.495	0.605	0.47	0.659	0.4	0.388	0.4	0.501

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
17252	5004	1574	1796	47

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
31793	8199	2058	2324	42

awariac · 2025-10-24T22:20:58Z

yoooo

tilk · 2025-10-27T09:35:30Z

Nice improvement!

awariac · 2025-11-05T20:18:53Z

Nice improvement!
@tilk

yeah, it is, but I expected it to be bigger (hope that's not a wishful thinking). Did you have any expectations of it?

tilk · 2025-11-06T09:31:32Z

yeah, it is, but I expected it to be bigger (hope that's not a wishful thinking). Did you have any expectations of it?

Also hoped for a bigger difference, but it looks like there are other factors at play. A few of the most obvious for me are:

Mispredictions still have a cost (filling the pipeline, keeping the functional units busy with not needed operations).
The core cannot execute dependent instructions cycle after cycle as there is no forwarding from instruction results to RS (and full forwarding would probably be costly to implement). Superscalarity would help discover independent instructions faster.
According to metrics, quite a lot of instructions seem to linger in RS for some time. There could be multiple reasons for that: maybe the results are not coming in fast enough, or maybe the other end of the pipeline is the limiting factor. I would consider adding more metrics to RS to help understand the behavior.
Lack of superscalarity means that if multiple instructions in different RS have their operand ready in the same cycle, they still need to complete in sequence.
The LSU blocks reordering of instructions.

It's hard to decide the importance of these factors on the final performance. There is probably no single biggest reason, I'm starting to believe that OoO performance is complex and comes from a combination of many factors.

awariac · 2025-11-07T12:22:45Z

verify linux boot

and it doesn't boot, so more obscure bugs awaiting :/

awariac · 2025-11-08T14:17:51Z

#451 would be useful to find some manageable test cases

tilk · 2025-11-08T21:50:19Z

Given that benchmarks and other full core tests work fine, most probably the issue is related to M-mode and interrupt handling. Indeed RISCV-DV claims to cover this.

github-actions · 2026-03-02T15:22:50Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▲ 0.517 (+0.085)	▲ 0.596 (+0.042)	▲ 0.477 (+0.115)	▲ 0.672 (+0.018)	▲ 0.406 (+0.044)	▲ 0.397 (+0.095)	▲ 0.404 (+0.072)	▲ 0.505 (+0.067)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 16659 (+1333)	▲ 4989 (+761)	▲ 1570 (+120)	▲ 1796 (+244)	▼ 48 (-2)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 33134 (+1387)	▲ 8459 (+770)	▲ 2086 (+152)	▲ 2420 (+244)	▼ 40 (-1)

awariac · 2026-05-04T00:21:55Z

finally made a merge with superscalarity, new frontend and other changes.
stopped working correctly on core and checkpointing tests.
needs further investigation

github-actions · 2026-05-04T20:10:15Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▲ 0.503 (+0.072)	▲ 0.605 (+0.051)	▲ 0.492 (+0.110)	▲ 0.670 (+0.017)	▲ 0.402 (+0.041)	▲ 0.385 (+0.080)	▲ 0.393 (+0.062)	▲ 0.504 (+0.054)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 18368 (+392)	▲ 5565 (+765)	▲ 1490 (+140)	▲ 1612 (+240)	▼ 41 (-6)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 50392 (+1855)	▲ 10948 (+764)	▲ 3668 (+112)	▲ 2860 (+224)	▼ 24 (-11)

github-actions · 2026-05-04T20:17:12Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▲ 0.503 (+0.072)	▲ 0.605 (+0.051)	▲ 0.492 (+0.110)	▲ 0.670 (+0.017)	▲ 0.402 (+0.041)	▲ 0.385 (+0.080)	▲ 0.393 (+0.062)	▲ 0.504 (+0.054)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 18559 (+583)	▲ 5565 (+765)	▲ 1490 (+140)	▲ 1612 (+240)	▼ 44 (-3)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 49882 (+1345)	▲ 10948 (+764)	▲ 3668 (+112)	▲ 2860 (+224)	▼ 27 (-9)

github-actions · 2026-05-04T23:46:43Z

Benchmarks summary

Performance benchmarks

aha-mont64	crc32	minver	nettle-sha256	nsichneu	slre	statemate	ud
▲ 0.503 (+0.072)	▲ 0.605 (+0.051)	▲ 0.492 (+0.110)	▲ 0.670 (+0.017)	▲ 0.402 (+0.041)	▲ 0.385 (+0.080)	▲ 0.393 (+0.062)	▲ 0.504 (+0.054)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 18676 (+700)	▲ 5565 (+765)	▲ 1458 (+108)	▲ 1612 (+240)	▼ 43 (-4)

Synthesis benchmarks (full)

Device utilisation: (ECP5)	LUTs used as DFF: (ECP5)	LUTs used as carry: (ECP5)	LUTs used as ram: (ECP5)	Max clock frequency (Fmax)
▲ 48875 (+338)	▲ 10948 (+764)	▲ 3664 (+108)	▲ 2860 (+224)	▼ 25 (-10)

awariac added 8 commits August 15, 2025 22:42

WIP vivoa

c117e23

Fix precommit logic and comb loop

7adb636

WIP pz0

c3798f2

WIP exceptions

396e5cf

bugs round 1

2c7fb36

fixes round 2 (now only async fail)

972fba1

bug round 3

7a87566

bugs round 4+5

d64005b

awariac added enhancement New feature or request performance Improves performance benchmark Benchmarks should be run for this change microarch Involves the processor's microarchitecture labels Oct 20, 2025

awariac marked this pull request as draft October 20, 2025 20:07

awariac added benchmark Benchmarks should be run for this change and removed benchmark Benchmarks should be run for this change labels Oct 20, 2025

awariac added 5 commits October 20, 2025 22:15

Merge branch 'master' into piotro/checkpointing-integration

cd8714a

adapt to transactron update

1843a38

fix LSU inactive handling

3024500

Fix atomic operations issue

800bea2

Tag check fixes

115a5e7

awariac mentioned this pull request Mar 2, 2026

Fix RF forward fail on long pipeline stall #856

Merged

Merge branch 'master' into piotro/checkpointing-integration

040b283

tilk added the nlnet The work is part of the NLnet grant label Apr 10, 2026

This comment was marked as outdated.

Sign in to view

Merge branch 'master' into piotro/checkpointing-integration

be81f6d

awariac force-pushed the piotro/checkpointing-integration branch from b1c7a58 to be81f6d Compare May 4, 2026 15:19

awariac added 2 commits May 4, 2026 21:41

Merge branch 'master' into piotro/checkpointing-integration

834ada3

Retirement trap handler fix (marge)

5389bfb

obvious bug

3dea3d4

Conversation

awariac commented Oct 20, 2025

Uh oh!

github-actions Bot commented Oct 24, 2025

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Uh oh!

awariac commented Oct 24, 2025

Uh oh!

tilk commented Oct 27, 2025

Uh oh!

awariac commented Nov 5, 2025

Uh oh!

tilk commented Nov 6, 2025

Uh oh!

awariac commented Nov 7, 2025

Uh oh!

awariac commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tilk commented Nov 8, 2025

Uh oh!

github-actions Bot commented Mar 2, 2026

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Uh oh!

This comment was marked as outdated.

Uh oh!

awariac commented May 4, 2026

Uh oh!

github-actions Bot commented May 4, 2026

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Uh oh!

github-actions Bot commented May 4, 2026

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Uh oh!

github-actions Bot commented May 4, 2026

Benchmarks summary

Performance benchmarks

Synthesis benchmarks (basic)

Synthesis benchmarks (full)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awariac commented Nov 8, 2025 •

edited

Loading