Description
benchstat
is a very useful tool, but if you're not familiar with what it does, it can be very confusing to use for the first time.
One such example is "how many times should my benchmarks run". If one has used benchcmp
before, running each benchmark before and after exactly once, trying to use benchstat
will result in something confusing like:
name old time/op new time/op delta
Decode-4 2.20s ± 0% 1.54s ± 0% ~ (p=1.000 n=1+1)
The answer here is that the user should be running the benchmark more times - at least 3 or 4 to get p-values low enough for a result.
However, neither benchstat -h
nor the godoc page are very clear on this, nor do they have a "quickstart" guide. The godoc page does show an example input with many runs, and does talk about "a number of runs" and p-values, but it's not very clear if you're not familiar with statistics and benchmarking.
I believe that a quick guide would greatly improve the usability of the tool - for example:
$ go test -bench=. -count 5 >old.txt
$ <apply source changes>
$ go test -bench=. -count 5 >new.txt
$ benchstat old.txt new.txt
I think it should also introduce other best practices, such as:
- Using higher
-count
values if the benchmark numbers aren't stable - Usingn
-benchmem
to also get stats on allocated objects and space - Running the benchmarks on an idle machine not running on battery (and with power management off?)
- Adding
-run='$^'
or-run=-
to eachgo test
command to avoid running the tests too
I realise that some of these tips are more about benchmarking than benchstat
itself. But I think it's fine to have it all there, as in general you're going to be using that tool anyway.
/cc @rsc @ALTree @aclements @AlekSi