Skip to content

Implement initial Fetch Target Queue (FTQ)#869

Draft
xThaid wants to merge 1 commit intokuznia-rdzeni:masterfrom
xThaid:20260308-ftq-stub
Draft

Implement initial Fetch Target Queue (FTQ)#869
xThaid wants to merge 1 commit intokuznia-rdzeni:masterfrom
xThaid:20260308-ftq-stub

Conversation

@xThaid
Copy link
Copy Markdown
Contributor

@xThaid xThaid commented Mar 13, 2026

The Fetch Target Queue (FTQ) is a buffer that decouples branch prediction from instruction fetch. The branch predictor writes predicted fetch addresses (along with some prediction metadata) into the FTQ ahead of time, while the fetch unit consumes entries from the queue to retrieve the corresponding instruction blocks. FTQ stores prediction information so it can be later sent back to the BPU for predictor training after instruction commit. FTQ maintains the complete lifecycle of instructions from prediction to commit.

In this PR I'm adding an initial implementation of FTQ. It doesn't do anything meaningful at this point: it simply stores basic predictions from the branch prediction unit and then sends them to the fetch unit.

TODO in this PR:

  • some FTQ unit tests
  • documentation

@xThaid xThaid added enhancement New feature or request microarch Involves the processor's microarchitecture benchmark Benchmarks should be run for this change labels Mar 13, 2026
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.434 (0.000) ▼ 0.554 (-0.000) ▲ 0.365 (+0.000) ▼ 0.653 (-0.000) ▼ 0.361 (-0.001) ▼ 0.302 (-0.000) ▲ 0.332 (+0.000) ▼ 0.438 (-0.000)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 15955 (-1156) ▲ 4420 (+64) ▲ 1466 (+56) ▲ 1620 (+68) ▼ 46 (-6)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 31655 (+722) ▲ 7879 (+62) ▲ 1950 (+24) ▲ 2300 (+124) ▲ 42 (+2)

@awariac awariac added the nlnet The work is part of the NLnet grant label Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Benchmarks should be run for this change enhancement New feature or request microarch Involves the processor's microarchitecture nlnet The work is part of the NLnet grant

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants