Skip to content

do_nop is not negligible #177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jserv opened this issue Jul 27, 2023 · 0 comments · Fixed by #184
Closed

do_nop is not negligible #177

jserv opened this issue Jul 27, 2023 · 0 comments · Fixed by #184
Assignees

Comments

@jserv
Copy link
Contributor

jserv commented Jul 27, 2023

do_nop appears to consume some CPU cycles according to perf report. It is generated by macro operation fusion, and we should aim to eliminate its overhead as early as possible.

Reproduce:

  1. Apply the following patch:
--- a/Makefile
+++ b/Makefile
@@ -5,6 +5,7 @@ OUT ?= build
 BIN := $(OUT)/rv32emu
 
 CFLAGS = -std=gnu99 -O2 -Wall -Wextra
+CFLAGS += -g -fno-omit-frame-pointer
 CFLAGS += -Wno-unused-label
 CFLAGS += -include src/common.h
 
@@ -80,7 +81,7 @@ endif
 
 # For tail-call elimination, we need a specific set of build flags applied.
 # FIXME: On macOS + Apple Silicon, -fno-stack-protector might have a negative impact.
-$(OUT)/emulate.o: CFLAGS += -fomit-frame-pointer -fno-stack-check -fno-stack-protector
+$(OUT)/emulate.o: CFLAGS += -fno-stack-check -fno-stack-protector
 
 # Clear the .DEFAULT_GOAL special variable, so that the following turns
 # to the first target after .DEFAULT_GOAL is not set.
  1. Rebuild and run dhrystone benchmark
$ make clean all
$ perf record -g build/rv32emu build/dhrystone.elf
  1. Check the report via perf report -g and be aware of the percentage of do_nop.
-   99.87%     0.16%  rv32emu  rv32emu             [.] main
   - 99.71% main
      + 68.62% rv_step
        8.11% do_addi
        3.05% do_nop
        3.05% do_auipc.part.0
        2.26% block_find
        1.96% do_bne
        1.65% do_sw
        1.60% do_lw
        1.24% do_bgeu
        1.18% do_beq
        1.05% do_or
        0.95% do_fuse3
        0.83% do_andi
        0.82% do_lbu
        0.72% do_jal
@jserv jserv changed the title do_nop is not negligible do_nop is not negligible Jul 27, 2023
qwe661234 pushed a commit to qwe661234/rv32emu that referenced this issue Aug 4, 2023
1. Refine origin fused instruction by skipping insturction nop and
   correctly updating value to register.
2. Add new fused insturction lui + addi.

Benchmark dhrystone gains about 3% performance improvement base on this
modification.

Close: sysprog21#177
@jserv jserv closed this as completed in #184 Aug 4, 2023
jserv pushed a commit that referenced this issue Aug 4, 2023
This commit refines the macro fused instruction by skipping the "nop"
instruction and ensuring proper value updates to the register.
Additionally, it introduces a new fused instruction, lui + addi.
As a result of these modifications, the Dhrystone benchmark experiences
approximately a 3% performance improvement.

Close #177

Co-authored-by: Yen-Fu Chen <[email protected]>
vestata pushed a commit to vestata/rv32emu that referenced this issue Jan 24, 2025
This commit refines the macro fused instruction by skipping the "nop"
instruction and ensuring proper value updates to the register.
Additionally, it introduces a new fused instruction, lui + addi.
As a result of these modifications, the Dhrystone benchmark experiences
approximately a 3% performance improvement.

Close sysprog21#177

Co-authored-by: Yen-Fu Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants