diff --git a/cmd/benchstat/README.md b/cmd/benchstat/README.md deleted file mode 100644 index 0a299af..0000000 --- a/cmd/benchstat/README.md +++ /dev/null @@ -1,85 +0,0 @@ -# Benchstat - -Benchstat computes and compares statistics about benchmarks. - -Usage: - - benchstat [options] old.txt [new.txt] [more.txt ...] - -Run `benchstat -h` for the list of supported options. - -Each input file should contain the concatenated output of a number of runs -of `go test -bench`. For each different benchmark listed in an input file, -benchstat computes the mean, minimum, and maximum run time, after removing -outliers using the interquartile range rule. - -If invoked on a single input file, benchstat prints the per-benchmark -statistics for that file. - -If invoked on a pair of input files, benchstat adds to the output a column -showing the statistics from the second file and a column showing the percent -change in mean from the first to the second file. Next to the percent -change, benchstat shows the p-value and sample sizes from a test of the two -distributions of benchmark times. Small p-values indicate that the two -distributions are significantly different. If the test indicates that there -was no significant change between the two benchmarks (defined as p > 0.05), -benchstat displays a single ~ instead of the percent change. - -The -delta-test option controls which significance test is applied: utest -(Mann-Whitney U-test), ttest (two-sample Welch t-test), or none. The default -is the U-test, sometimes also referred to as the Wilcoxon rank sum test. - -If invoked on more than two input files, benchstat prints the per-benchmark -statistics for all the files, showing one column of statistics for each -file, with no column for percent change or statistical significance. - -The -html option causes benchstat to print the results as an HTML table. - -## Example - -Suppose we collect benchmark results from running `go test -bench=Encode` -five times before and after a particular change. - -The file old.txt contains: - - BenchmarkGobEncode 100 13552735 ns/op 56.63 MB/s - BenchmarkJSONEncode 50 32395067 ns/op 59.90 MB/s - BenchmarkGobEncode 100 13553943 ns/op 56.63 MB/s - BenchmarkJSONEncode 50 32334214 ns/op 60.01 MB/s - BenchmarkGobEncode 100 13606356 ns/op 56.41 MB/s - BenchmarkJSONEncode 50 31992891 ns/op 60.65 MB/s - BenchmarkGobEncode 100 13683198 ns/op 56.09 MB/s - BenchmarkJSONEncode 50 31735022 ns/op 61.15 MB/s - -The file new.txt contains: - - BenchmarkGobEncode 100 11773189 ns/op 65.19 MB/s - BenchmarkJSONEncode 50 32036529 ns/op 60.57 MB/s - BenchmarkGobEncode 100 11942588 ns/op 64.27 MB/s - BenchmarkJSONEncode 50 32156552 ns/op 60.34 MB/s - BenchmarkGobEncode 100 11786159 ns/op 65.12 MB/s - BenchmarkJSONEncode 50 31288355 ns/op 62.02 MB/s - BenchmarkGobEncode 100 11628583 ns/op 66.00 MB/s - BenchmarkJSONEncode 50 31559706 ns/op 61.49 MB/s - BenchmarkGobEncode 100 11815924 ns/op 64.96 MB/s - BenchmarkJSONEncode 50 31765634 ns/op 61.09 MB/s - -The order of the lines in the file does not matter, except that the output -lists benchmarks in order of appearance. - -If run with just one input file, benchstat summarizes that file: - - $ benchstat old.txt - name time/op - GobEncode 13.6ms ± 1% - JSONEncode 32.1ms ± 1% - -If run with two input files, benchstat summarizes and compares: - - $ benchstat old.txt new.txt - name old time/op new time/op delta - GobEncode 13.6ms ± 1% 11.8ms ± 1% -13.31% (p=0.016 n=4+5) - JSONEncode 32.1ms ± 1% 31.8ms ± 1% ~ (p=0.286 n=4+5) - -Note that the JSONEncode result is reported as statistically insignificant -instead of a -0.93% delta. diff --git a/cmd/benchstat/main.go b/cmd/benchstat/main.go index b882c06..7c6e76d 100644 --- a/cmd/benchstat/main.go +++ b/cmd/benchstat/main.go @@ -79,7 +79,6 @@ // name time/op // GobEncode 13.6ms ± 1% // JSONEncode 32.1ms ± 1% -// $ // // If run with two input files, benchstat summarizes and compares: // @@ -87,11 +86,77 @@ // name old time/op new time/op delta // GobEncode 13.6ms ± 1% 11.8ms ± 1% -13.31% (p=0.016 n=4+5) // JSONEncode 32.1ms ± 1% 31.8ms ± 1% ~ (p=0.286 n=4+5) -// $ // // Note that the JSONEncode result is reported as // statistically insignificant instead of a -0.93% delta. // +// An example benchmarking workflow in Unix shell language: +// +// oldBin=/tmp/benchmarkBinaryOld +// newBin=/tmp/benchmarkBinaryNew +// old=/tmp/benchmarkReportOld +// new=/tmp/benchmarkReportNew +// result=/tmp/benchstatReport +// +// # Create first test executable. +// go test -c -o "$oldBin" -bench . +// +// # Apply code patch now +// git checkout fixes +// +// # Create the other test executable. +// go test -c -o "$newBin" -bench . +// +// # Test and benchmark. +// for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13; do +// printf 'Tests %s starting.\n' "$i" +// "$oldBin" -test.bench . >> "$old" +// "$newBin" -test.bench . >> "$new" +// done +// +// # Create final report with benchstat. +// benchstat "$old" "$new" > "$result" +// +// Possible variations include disabling tests (done with the command +// line arguments "-run -"), running three instead of two benchmark +// executables in the loop or increasing niceness or, even better, +// running the binaries under a real time scheduling policy (see +// sched_setscheduler and SCHED_FIFO). If you are on Linux and have +// the chrt program, to run the test binary under a real time +// scheduling policy run it like so: +// +// chrt -f 50 testBinary -test.bench regexp >> out +// +// Be aware, though, that since a real time scheduling policy gives a +// process or thread as much time as it "wants" to take, a thread of +// the running testBinary process or one of its children can take up +// all the time of a CPU core, and thus testBinary and its children +// could, if malicious or simply buggy, effectively make a denial of +// service attack on your computer. +// +// Other general benchmarking tips for reducing noise, Linux specific, +// include disabling address space randomization and disabling Intel +// turbo mode: +// +// printf 0 > /proc/sys/kernel/randomize_va_space +// printf 1 > /sys/devices/system/cpu/intel_pstate/no_turbo +// +// If your computer has sufficient cooling, set the Linux "performance" +// frequency scaling governor for all cores: +// +// for f in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do +// printf performance > "$f" +// done +// +// If your computer has insufficient cooling, lower the maximum +// frequency of all CPU cores: +// +// # Get minimum frequency. +// cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq +// +// for f in /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq; do +// printf minimumFrequency > "$f" +// done package main import (