testing: autodetect appropriate benchtime #10930

josharian · 2015-05-22T00:30:08Z

For discussion:

There is tension in how long to run benchmarks for. You want to run long, in order to make any overhead irrelevant and to reduce run-to-run variance. You want to run short, so that it takes less time; if you have a fixed amount of computing time, it'd be better to run multiple short tests, so you can do better analysis than taking the mean, perhaps by using benchstat.

Right now we use a fixed duration, which is ok, but we could do better. For example, many of the microbenchmarks in strconv appear stable at 10ms, which is 100x faster than the default of 1s.

Rough idea, input welcomed:

The time to run a benchmark is V+C*b.N, where b.N is the number of iterations and V and C are random variables -- V for overhead, C for code execution time. We can take measurements using different b.N (starting small and growing) and estimate V and N. Based on that, we can calculate what b.N value is required to reduce the contribution of V to the sum to some fixed limit, say 1%.

This should allow stable, fast benchmarks to execute very quickly. Slower benchmarks would get slower (you have to execute with b.N=1 and 2 at a bare minimum), but that's better than accidentally misleading the user into thinking that they have a meaningful performance number, which is what can currently happen.

We would probably want to change benchtime to be a cap on running time and increase the default value substantially. If stable numbers are not achievable within the provided benchtime, we would warn the user, who could increase the benchtime or change the benchmark.

I put together a quick-and-dirty version of this using linear regression to estimate V and C. It almost immediately caught a badly behaved benchmark (fixed in CL 10053), when it estimated that the benchmark would take hours to run in order to be reliable. I haven't run it outside the encoding/json package; I imagine that there are other benchmarks that need fixing.

Again, input welcomed. I'm not a statistician; I don't even play one on TV.

The text was updated successfully, but these errors were encountered:

minux · 2015-05-22T01:01:39Z

I have another idea.

We can add two new interfaces to testing package,
one is to list the benchmarks/test/examples within a test binary,
this is trivial.

The other is specify how many times to run a given benchmark
and report the result no matter what (disregard benchtime).

Then we can have an external tool driving the benchmark to
do proper statistic analysis.

I really want benchcmp or some other tool (e.g. Russ' benchstat)
to be given old and new binaries, and automagically do the
right thing and give me result.

Then the benefit is that we decouple the statistic engine from
the testing package.

If we want to do better, we can even make a benchmark server
listen for jsonrpc calls, but that's probably too much for the
testing package.

josharian · 2015-05-22T01:14:33Z

The prototype implementation I referred to was done by hacking in a stupid line-oriented benchmark server and writing a simple external driver program. So we're thinking along similar lines.

The advantage of a server is that the overhead of executing the binary each time might prove significant when we're talking about 5ns-per-op benchmarks.

I'm agnostic about whether the statistic engine should be internal or not. I just want it to be good, and I know that (1) that requires testing support and (2) that I personally lack the domain expertise to make it awesome. :)

josharian · 2015-05-22T01:19:02Z

Oh, and I have a simple driver script that invokes benchmarks in a loop and uses benchstat to print rolling results. I'd love to invest to make it sophisticated and generally useful (right now it is very tuned to my personal setup and habits), but again I lack the statistics expertise to design it correctly. I agree that accepting two test binaries is the right api.

minux · 2015-05-23T06:49:34Z

I have a feeling that we can write a standalone package to achieve the benchmark server effect without any testing package support. All you have to do is to blank import my package in one of the test files and set -test.server flag. Of course, some unsafe hackery will be required, but not much. (It needs access to main.benchmarks and main.tests slice and some private methods of testing.(*B)) Are you interested?

josharian · 2015-05-26T20:55:37Z

I'm interested in doing this, but not using unsafe.

minux · 2015-05-26T21:51:19Z

The unsafe hacks will be pretty limited. We need to access main.benchmarks, main.tests and testing.(*B).runN (with //go:linkname or asm stubs) All the other parts of the testing.B type can be accessed with reflect.

josharian · 2015-05-26T21:54:44Z

Some simple codegen could provide the flag and the list of benchmarks. The basic implementation of benchmarking is pretty simple and could be copy/pasted; CL 10020 makes it simpler yet.

I'd still rather design proper support for core building blocks into the stdlib, but I guess this would be better than nothing.

Any interest in working on this with me?

josharian · 2015-05-28T00:51:47Z

@minux well, a combination of unsafe, reflection, and testing.M seemed like the best combination in the end. Please see package benchserve (godoc) for an initial implementation. Feedback most welcome. If it looks good to you, then we can turn our attention to making a good test driver, probably at a first pass a combination of bench and rod.

minux · 2015-05-28T01:08:41Z

The only issue of using TestMain is that we need to handle the case where the test is already using TestMain to do tests setups. My approach is very hacky, it injects a new test into the list of tests during package init() and then set -test.run to run that test (which is actually the jsonrpc bench server) I realized that I probably used too much unsafe hackery in the code though.....

josharian · 2015-05-28T01:36:59Z

I've updated benchserve to support the case in which there is already a TestMain. I also added an (aspirational, unwritten) client API to make writing drivers easier.

yonderblue · 2017-07-12T19:05:12Z

What about adding a -minbenchiterations that takes precendence over benchtime so faster benchmarks can complete quickly and longer ones can still run enough times for a meaningful number?

josharian · 2017-07-12T20:24:12Z

@Gaillard I think -benchsplit (#19128) will help on this front.

yonderblue · 2017-07-14T01:55:05Z

@josharian Say I have two benchmarks A and B, A takes 10s to get to 1k iterations and B 1s to get to 1k iterations. I'd like it to run, just like that.
With benchsplit = 10, in order to get enough iterations I'd still have to specify benchtime=10s and then B would take 10s right?

azavorotnii · 2017-07-14T02:42:30Z

I think when we have different sizes of benchmarks, it would be much easier to have basic control from inside of benchmark itself. For example:

have ability to change "b.duration":

func BenchmarkSomethingBig(b *testing.B) {
  b.Duration = 10 * time.Second
}

have ability to set minimal b.N:

func BenchmarkSomethingUnstable(b *testing.B) {
  b.MinimalN(1000)
}

As for me, it is much convenient to adjust specific variables than apply command line options to all benchmarks together.

josharian added this to the Go1.6 milestone May 22, 2015

josharian modified the milestones: Go1.6, Go1.6Early Jun 29, 2015

rsc modified the milestones: Unplanned, Go1.6Early Nov 4, 2015

josharian mentioned this issue Mar 6, 2017

testing: add -benchsplit to get more data points #19128

Open

josharian mentioned this issue Jan 21, 2018

x/perf/cmd/benchstat: tips or quickstart for newcomers #23471

Open

bcmills mentioned this issue Apr 6, 2018

testing: add -benchtime=100x (x suffix for exact count) #24735

Closed

josharian mentioned this issue Apr 26, 2020

cmd/vet: flag benchmarks that don’t use b #38677

Open

bcmills mentioned this issue Sep 30, 2020

testing: benchmark iteration reports incorrectly #41637

Closed

seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

testing: autodetect appropriate benchtime #10930

testing: autodetect appropriate benchtime #10930

josharian commented May 22, 2015 •

edited by gabyhelp

Loading

minux commented May 22, 2015

josharian commented May 22, 2015

josharian commented May 22, 2015

minux commented May 23, 2015 via email

josharian commented May 26, 2015

minux commented May 26, 2015 via email

josharian commented May 26, 2015

josharian commented May 28, 2015

minux commented May 28, 2015 via email

josharian commented May 28, 2015

yonderblue commented Jul 12, 2017

josharian commented Jul 12, 2017

yonderblue commented Jul 14, 2017

azavorotnii commented Jul 14, 2017

testing: autodetect appropriate benchtime #10930

testing: autodetect appropriate benchtime #10930

Comments

josharian commented May 22, 2015 • edited by gabyhelp Loading

minux commented May 22, 2015

josharian commented May 22, 2015

josharian commented May 22, 2015

minux commented May 23, 2015 via email

josharian commented May 26, 2015

minux commented May 26, 2015 via email

josharian commented May 26, 2015

josharian commented May 28, 2015

minux commented May 28, 2015 via email

josharian commented May 28, 2015

yonderblue commented Jul 12, 2017

josharian commented Jul 12, 2017

yonderblue commented Jul 14, 2017

azavorotnii commented Jul 14, 2017

josharian commented May 22, 2015 •

edited by gabyhelp

Loading