Skip to content

Make RS feed FUs with garbage if flushing#740

Open
Arusekk wants to merge 2 commits intokuznia-rdzeni:masterfrom
Arusekk:rsflush-598
Open

Make RS feed FUs with garbage if flushing#740
Arusekk wants to merge 2 commits intokuznia-rdzeni:masterfrom
Arusekk:rsflush-598

Conversation

@Arusekk
Copy link
Copy Markdown
Contributor

@Arusekk Arusekk commented Oct 29, 2024

See #598; does not skip FUs but shows the concept.

@Arusekk Arusekk added the performance Improves performance label Oct 29, 2024
@tilk tilk added the benchmark Benchmarks should be run for this change label Oct 29, 2024
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
0.421 0.513 0.339 0.655 0.364 0.29 0.328 0.43

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
15885 6043 834 1068 43

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
28877 9298 1790 1248 40

Comment thread coreblocks/func_blocks/fu/common/rs.py Outdated
@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.421 (+0.004) 0.513 (0.000) ▲ 0.339 (+0.002) ▲ 0.655 (+0.000) ▲ 0.364 (+0.003) 0.290 (0.000) ▲ 0.328 (+0.002) ▼ 0.430 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 14911 (+396) 6043 (0) 834 (0) 1068 (0) ▼ 41 (-14)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 24301 (-546) 9298 (0) ▲ 1790 (+32) 1248 (0) ▼ 35 (-10)

@Arusekk Arusekk force-pushed the rsflush-598 branch 2 times, most recently from e96d13d to c61a1ee Compare November 22, 2024 10:47
@github-actions

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.421 (+0.004) 0.513 (0.000) ▲ 0.339 (+0.002) ▲ 0.655 (+0.000) ▲ 0.364 (+0.003) 0.290 (0.000) ▲ 0.328 (+0.002) ▼ 0.430 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 13982 (-282) 6043 (0) 834 (0) 1068 (0) ▼ 39 (-18)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 23200 (-1676) 9298 (0) 1790 (0) 1248 (0) ▼ 33 (-9)

@awariac
Copy link
Copy Markdown
Member

awariac commented Nov 26, 2024

[additional comments to discussion from meeting]

I checked that synchronous flushing signal would work in RSInsertion - because FreeRF/RF valid bits are also updated in sync domain, effect would be visible next cycle (and old RF entry inserted into RS). Change in RSInsertion would also not cause any performance loss. (and is definitely safe in RS too)

Proposition with resetting RF valid in Register Allocation would be problematic with checkpointing, that pushes new instruction immediately.

The last place is LSU:
LSU operations have a very high cost, I don't see why we should de-optimize it if this part of LSU is not on critical path (unless it is).

@github-actions
Copy link
Copy Markdown

Benchmarks summary

Performance benchmarks

aha-mont64 crc32 minver nettle-sha256 nsichneu slre statemate ud
▲ 0.421 (+0.004) 0.513 (0.000) ▲ 0.339 (+0.002) ▲ 0.655 (+0.000) ▲ 0.364 (+0.003) 0.290 (0.000) ▲ 0.328 (+0.002) ▼ 0.430 (-0.001)

You can view all the metrics here.

Synthesis benchmarks (basic)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▲ 14964 (+721) 4398 (0) 1456 (0) 1164 (0) ▼ 37 (-16)

Synthesis benchmarks (full)

Device utilisation: (ECP5) LUTs used as DFF: (ECP5) LUTs used as carry: (ECP5) LUTs used as ram: (ECP5) Max clock frequency (Fmax)
▼ 20268 (-3207) 7013 (0) 1818 (0) 1216 (0) ▼ 33 (-9)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Benchmarks should be run for this change performance Improves performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants