Fix performance problems after #532 by awariac · Pull Request #545 · kuznia-rdzeni/coreblocks

awariac · 2023-12-18T12:00:07Z

#532 introduced loss in Fmax and IPC.
Both problems are fixed and explained in comments.
For other details see discussion at end of comments in #532

tilk · 2023-12-18T12:10:42Z

+        self.layouts = gp.get(ExceptionRegisterLayouts)
+
+        # Break long combinational paths from single-cycle FUs
+        # Insertion of FIFO is fine, because self.report is always ready, and will delay report by


This is a single FIFO for all FUs. What will happen when two FUs report an exception in the same clock cycle? Did that work previously?

It is the same behaviour as last time. Both FUs cannot report exception at the same time, and only one will proceed with its Transaction.
This is a performance problem, because calls to report are under Ifs so all FUs reporting instruction are mutually exclusive at some stage.
As we discussed on the meeting, it may be solved via call under condition (btw, does it fix that -- will it block parent transction if we want to call report but it is not ready??).
Optimizing to allow reporting multiple exceptions at one cycle is not worth it (we have large flush/resume penalty, so one additional blocked cycle wouldn't change too much).
Anyway it is topic for another PR.

I'm still wondering about the cycle delays.

The delay from e.g. CSR to ROB change is in best case only one cycle. CSR's get_result goes through FuncBlocksUnifier, which uses a Collector, which uses a Forwarder, which finally connects to the transaction in ResultAnnouncement. The same transaction calls rob_mark_done, which updates the ROB.

On the other hand, the update of the ExceptionCauseRegister is two cycles from CSR's get_result: one cycle for the FIFO, and another cycle for the register update.

Therefore there is a possibility for Retirement to see the ROB exception bit set, but not have the right cause.

Am I wrong?

Ideally, I would like to have this latency-insensitive; in other words, Retirement should wait if the rob_id in ExceptionCauseRegister is not the rob_id of currently processed entry. One way to do this cleanly would be to add a rob_id argument to the ExceptionCauseRegister's get method, and use validate_arguments to only accept the call when the rob_id matches the rob_id in the register.

Oh, you are right, thanks! I missed the sync delay in ExceptionCauseRegister.

Retirement solution seems nice.
Now I think about it, we don't even need to do that! We just need to check a valid bit of ExceptionCauseRegister. We only care about the first trap. And if it is processed, ExceptionCauseRegister is in the clear() state!
I will look into validate_arguments, I didn't used it before. Hope it is possible to do that (we need exception bit from rob_peek and exception cause register result to calculate it)

At the same time I'm preparing Retirement refactor (and will make PR for it in a minute), I think it should be merged first (but lets see what the reception will be)

Now I think about it, we don't even need to do that! We just need to check a valid bit of ExceptionCauseRegister. We only care about the first trap.

The first trap in wall clock time might come from an instruction which was not the first in instruction order. So the valid bit will be 1, but the rob_id will be wrong, and we get the wrong cause. Am I missing something?

You are right, my bad

There is still one problem - ExceptionCauseRegister allows reporting multiple exceptions for one instruction, and decides if the cause should be updated. In this case rob_id would match, but returned cause would be outdated.

Looking at current usage of ExceptionCauseRegister, this functionality may be not needed at all. All exceptions in FUs are reported in one place only (when setting exception bit), so priority is selected locally. If any exception happens before FU it is also reported in ExceptionFU - priority also needs to be manually determined in pipeline.

I think we can safely add limitation to support reporting only one exception per rob_id.
ECR will be simplified and I don't know if the addition of separation FIFO for Fmax will be necessary. Should it stay? It probably be too much noticeable, because JB mispedictions report exceptions one cycle earlier that your example.

awariac · 2023-12-23T14:15:52Z

                    with condition(m, priority=False) as cond:
                        with cond(commit):
                            retire_instr(rob_entry)
-                        with cond(~commit):  # Not using default, because we want to block if condition is not ready
+                        with cond(~commit):
                            flush_instr(rob_entry)


This condition in retirement can be eliminated with merge of #551, by using retire_instr and flush_instr instead of setting commit flag. Non-blocking is guaranteed by condition that is needed on higher level for rob.exception (and makes more sense)

If I remember right, condition was put here not for aesthetics, but to avoid locking FRAT's rename method.

It could be removed because it would be separated by existing condition of (~)rob_entry.exception.

No longer valid, because that exception condition caused problems with 2* LUT usage.
I fixed it by using separate Transaction instead of validate_args. get method is not blocking so excpetion condition can be removed and commit condition stays the same.

awariac · 2023-12-23T14:39:17Z

For some reason Fmax is now 40 MHz with critical path not related to changes here :/

EDIT: LUTS increased *2. Am I doing something wrong, or there is a bug in those functionalities?

awariac · 2024-01-02T15:28:17Z

Fixed, Fmax is 54 MHz, LUT usage back to normal.
It was caused by rob_entry.exception condition. Maybe it needs investigation, I don't see a reason why it doubled the LUT usage. Still happened at this commit after removal of validate_args.
For more change details refer to review comments

Stale review

lekcyjna123 · 2024-01-02T17:08:09Z

It is interesting. I checkouted 96400a6 generated verilog by ./scripts/gen_verilog.py -c basic and tried to synthesise it with Quartus, but I got syntax error. After some manual tweaks code get accepted. On the newest commit there is no more syntax error reported by Quartus.

EDIT: Btw. it looks like there is some regression in detecting amaranth memory by Quartus. On this PR there is 0 bits of memory found by Quratus. On my master thesis it was more than 100k bits.

Info (276007): RAM logic "top.main_module:main_module|top.main_module.c:c|top.main_module.c.icache:icache|top.main_module.c.icache.mem:mem|data_mem_0_rp" is uninferred due to asynchronous read logic

awariac added 4 commits December 16, 2023 21:28

Try remove conflict (tmp)

95c0dde

break exception repot comb path

8de8778

Apply fetch conflict

2eb857b

Cleaner exception FIFO

733f7a9

awariac added the performance Improves performance label Dec 18, 2023

tilk reviewed Dec 18, 2023

View reviewed changes

lekcyjna123 previously approved these changes Dec 18, 2023

View reviewed changes

awariac changed the title ~~Fix preformace problems after #532~~ Fix performance problems after #532 Dec 18, 2023

lekcyjna123 self-requested a review December 20, 2023 19:53

awariac marked this pull request as draft December 21, 2023 10:55

tilk mentioned this pull request Dec 21, 2023

Retirement refactor #548

Merged

awariac added 2 commits December 23, 2023 11:21

Merge branch 'master' into piotro888/interrupt-fmax-fix

375cbf0

Use verify args

cc12797

awariac force-pushed the piotro888/interrupt-fmax-fix branch from a73ca05 to 3387b08 Compare December 23, 2023 14:12

awariac commented Dec 23, 2023

View reviewed changes

awariac marked this pull request as ready for review December 23, 2023 14:16

awariac marked this pull request as draft December 23, 2023 14:39

Remove priority computation from ECR

70a8619

awariac force-pushed the piotro888/interrupt-fmax-fix branch from 3387b08 to 70a8619 Compare December 23, 2023 14:44

awariac added 2 commits January 2, 2024 13:50

Merge branch 'master' into piotro888/interrupt-fmax-fix

1d73f1d

Manual ECR validation

96400a6

awariac force-pushed the piotro888/interrupt-fmax-fix branch from 4736793 to 96400a6 Compare January 2, 2024 14:31

Remove not needed exception condition

0f6f795

awariac marked this pull request as ready for review January 2, 2024 15:24

tilk approved these changes Jan 4, 2024

View reviewed changes

lekcyjna123 approved these changes Jan 4, 2024

View reviewed changes

tilk merged commit 9b5a156 into kuznia-rdzeni:master Jan 4, 2024

github-actions Bot pushed a commit that referenced this pull request Jan 4, 2024

Fix performance problems after #532 (#545)

c3152ee

Conversation

awariac commented Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awariac Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awariac commented Dec 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awariac commented Jan 2, 2024

Uh oh!

lekcyjna123 commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

awariac commented Dec 18, 2023 •

edited

Loading

awariac Dec 20, 2023 •

edited

Loading

awariac commented Dec 23, 2023 •

edited

Loading

lekcyjna123 commented Jan 2, 2024 •

edited

Loading