Closed
Description
do_nop
appears to consume some CPU cycles according to perf report. It is generated by macro operation fusion, and we should aim to eliminate its overhead as early as possible.
Reproduce:
- Apply the following patch:
--- a/Makefile
+++ b/Makefile
@@ -5,6 +5,7 @@ OUT ?= build
BIN := $(OUT)/rv32emu
CFLAGS = -std=gnu99 -O2 -Wall -Wextra
+CFLAGS += -g -fno-omit-frame-pointer
CFLAGS += -Wno-unused-label
CFLAGS += -include src/common.h
@@ -80,7 +81,7 @@ endif
# For tail-call elimination, we need a specific set of build flags applied.
# FIXME: On macOS + Apple Silicon, -fno-stack-protector might have a negative impact.
-$(OUT)/emulate.o: CFLAGS += -fomit-frame-pointer -fno-stack-check -fno-stack-protector
+$(OUT)/emulate.o: CFLAGS += -fno-stack-check -fno-stack-protector
# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
- Rebuild and run dhrystone benchmark
$ make clean all
$ perf record -g build/rv32emu build/dhrystone.elf
- Check the report via
perf report -g
and be aware of the percentage ofdo_nop
.
- 99.87% 0.16% rv32emu rv32emu [.] main
- 99.71% main
+ 68.62% rv_step
8.11% do_addi
3.05% do_nop
3.05% do_auipc.part.0
2.26% block_find
1.96% do_bne
1.65% do_sw
1.60% do_lw
1.24% do_bgeu
1.18% do_beq
1.05% do_or
0.95% do_fuse3
0.83% do_andi
0.82% do_lbu
0.72% do_jal
Metadata
Metadata
Assignees
Labels
No labels