-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: cmd/go: run benchmarks for one iteration when testing #41824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
That's a good way of saying the default behavior this proposes. Are you trying to make an argument one way or the other with that statement? If so, what argument? |
I generally agree with this; I've been bitten by benchmarks going stale as well. I think benchmarks should be run as part of The only trouble I can imagine with this change is that it would cause pain if there are any low-quality benchmarks which would now suddenly run as part of
You could say "you should never do either of those things", but sometimes what you need to benchmark is just very costly, especially when benchmarking a feature end-to-end. And I've seen a number of such heavy benchmarks in the past, where even with I still think it's good to incentivize developers to write cheaper benchmarks, so I'm still in favor of this change. But for those cases where heavy benchmarks are useful, we should have a recommendation for keeping them around without slowing down The concern above reminds me of #36586 too, because runnable examples and benchmarks are similar in how they can't run in parallel with each other, or with tests. |
I would further suggest that |
Yes.
That seems worse than skipping with
That seems excessive, given the option of using |
I endorse a default of |
There are a bunch of benchmarks that do a lot of setup work anticipating that they will be run for the full 2s or more. |
CI systems today can use -bench=1x. Do any CI systems do that? |
One way to make this opt-in would be a method on testing.T. Maybe something like: // RunBenchmark runs the benchmark b once with b.N == 1.
// Any TB method called on b, such as Errorf or Skip, are treated as though they were called on b.
func (t *T) RunBenchmark(bench func(b *B)) Basically, sub-tests but crossing the test/benchmark line. This would also provide an opportunity for the calling testing.T to do other work, like calling Parallel. (You could also imagine doing this for examples, although then the examples would be executed twice during a normal run.) |
@mvdan I disagree with your implication that any benchmarks which do expensive setup work are "low-quality". One system I work on, for instance, is a database which includes benchmarks that initialize themselves with a large synthetic table; the benchmark loops run queries against the table. It takes a lot of CPU and RAM to do the initial setup and I wouldn't want to run these every time I do the usual From my perspective, things work well as-is and I would prefer we not change the existing behavior. |
Of course tests should be tests, but this proposal is to make it so that benchmarks don't go stale and become broken over time because no one runs them. Phrased differently, it's about testing the benchmark, not having the benchmark test the code.
Picking a sample of systems I'm aware of, plus whatever was at the top of the github marketplace for CI
I would be surprised if running the benchmarks as part of As far as users overriding the default configuration to run benchmarks, I could not find a single
In my personal experience, 3 out of 3 different workplaces did not run benchmarks as part of the tests. Indeed, as @bcmills said, the Go project itself had a broken benchmark for a period of over a year. I believe this is strong evidence that running benchmarks as part of testing is rare. |
I think this should be opt in, as it currently is with |
Indeed. The fact that CI systems can today opt in to running 1 iteration of benchmarks (using -benchtime=1x) but do not seems to me strong evidence that the Go command should not start doing it either. Benchmarks and tests are separate by design, for good reasons (benchmarks can be slow, even 1x, and that hurts It seems to me that we should keep them separate. As has been pointed out, CI systems and users have the ability to opt in even today, using -benchtime=1x. You could even set GOFLAGS=-benchtime=1x to force it into all your go test commands. |
Many CI systems predate the existence of "1x" as a benchtime option. For a long time, benchmarks could only be run in a loop until they took at least some amount of time, defaulting to a second, which in practice often required quite a few iterations per benchmark. Now that it exists, it's much more likely that people would start adopting it, but I think the lack of people doing it may reflect in part that it wasn't available when they specced out their setup. |
The CI systems are not the primary reason not to do this. The primary reason is the overall slowdown. |
None of the CI systems run |
Examples and tests are also separate for good reasons, yet examples are run during |
That's essentially open-ended; there is one data point above. Given the wide range of possible impacts on test times, the question then becomes what is the work impact on those project where the time impact is too high. For us, if there were no additional knobs in the form of command flags or env vars, the work would be onerous to prevent a blow-out of testing time by preventing running of benchmarks, and given the Go project's dislike of knobs, that additional work would be guaranteed. |
@zeebo you mentioned
What does this rotting look like? Is it mis compilation, or the benchmarks returning incorrect results? |
The searches in #41824 (comment) aren't complete, as they are searching through ".travis.yaml", not ".travis.yml", which is the massively more common filename. There are many examples of E.g., this is an old repo that only tests up to 1.6, but they were using a very short benchtime (100ms) to get something close to I'd be wary of running benchmarks by default, even at 1x; on my main project my benchmarks often spin up a temporary database. I run with 1x in my CI, but I've gone to great lengths to ensure that this can happen quickly (using a special image that caches the initialization, plus pooling and DB templating). I'm not certain everyone has gone through that effort. |
@davecheney It can't be mis-compilation because irrespective of the flags, the benchmarks are still built. In looking into the time impact on Gonum's test suite, I did find one set of cases where a change in the API had resulted in a rotted benchmark resulting in a panic (so I am torn by this, I don't want to see rotted benchmarks, but a greater than one order of magnitude increase in test suite running time is too much). |
@kortschak that's what I figured, thanks for confirming. |
It seems like there's consensus here (1) that there's a problem with benchmarks getting stale and (2) that running with I offered one suggestion above for making it easy to explicitly opt in on a benchmark by benchmark basis, and a related approach in #41878. Anyone else care to throw out other possible solutions? This seems like as good a location as any to brainstorm. |
Thank you @zikaeroh for spotting the errors in the analysis I made.
Anything that is opt-in may as well be during the invocation of the go tool, I think. In other words, I don't see the benefit of encoding the opt-in in the source code versus the arguments passed to |
What about just using names? Currently The only downside is that anything currently named Happy to entertain better names. |
You can specify a subset of benchmarks to include/exclude. You can enable b.Parallel. You can set it once in one place and not have to manage external configuration (CI) or training/documentation (teammates). |
@randall77 naming: Another option along those lines would be to keep existing Benchmark* name but allow functions to accept a *testing.BT instead of a *testing.B. The param type would be the opt in. Then you could hang a Parallel and a TestMode/IsBenchmark off that new type. A *testing.BT would also need a |
And if we wanted, with *testing.BT, instead of exposing b.N directly, we could expose b.Next(). But that might be a bridge too far, since it'd make opt-in more than a one character change. :) |
There's already the interface |
If you want to run a benchmark during a test:
I don't see why new API is needed here. |
@rsc that will do the full benchmark run, increasing b.N incrementally, which is much slower than only b.N=1. |
@josharian, indeed, I was clearly tired. :-) The point remains, though: if you want to opt-in to testing certain benchmark code during tests, factor it out and call it from a Test function. |
:) Here's a worked example of that approach. Here's the simplest possible benchmark: func BenchmarkFoo(b *testing.B) {
for i := 0; i < b.N; i++ {
foo()
}
} Here's what it looks like refactored to be a test and a benchmark: func benchmarkFoo(tb testing.TB) {
bN := 1
if b, ok := tb.(*testing.B); ok {
bN = b.N
}
for i := 0; i < bN; i++ {
foo()
}
}
func BenchmarkFoo(b *testing.B) {
benchmarkFoo(b)
}
func TestFoo(t *testing.T) {
benchmarkFoo(t)
} Maybe I'm missing a simpler way, but that's a lot of boilerplate, particularly repeated over many benchmarks. |
This, plus my rejected proposal for testing middleware, are sort running into similar things -- there's no good way to programatically express "for each test" or "for each benchmark", or request that specific things be run, or influence There's a lot of things like this where it's certainly possible to do the thing by hand-refactoring, but in a large test suite, it's hundreds of instances of boilerplate, and that means hundreds of things that could get out of sync. Maybe this is or should be a code generation problem. That said, adding RunTests, RunBenchmarks, and a way to interact with flags to testing.M would also solve a lot of this:
(Of course, I'd prefer wrapper funcs that take |
We could use testing.TB, like in my refactored code above. Code could then type switch to |
If you do have benchmarks, and you need to run them, you'll run them. If you don't need to run them, you don't need to run them. |
@josharian, isn't it:
? |
@rsc in my trivial example, yes. And that's a common case, and everything is obviously fine now in those cases. But there's often a fair amount more work (or at least LOC) in the benchmark. My example was trivial to attempt to highlight the structure of the refactoring under the assumption that completely copy/pasting the benchmark is not the right solution. Note that if you copy/paste the benchmark into a test, and the test fails, you also need to remember to make the corresponding benchmark fix. You could also take similar approaches, like factoring out setup and loop code. But that refactor will be awkward because of the testing.T/testing.B mismatch, and refactoring the loop into a function call may alter performance characteristics. |
I guess there are two important questions here.
As I've noted above, I don't see the argument for (1) being "yes", and I'm even more skeptical that (2) is "yes". We also don't run examples with no output, for good reason. I don't think we should start running those either. |
I believe documenting the existing solution as example for a go test invocation seems useful. Maybe repeat that in the testing package documentation and in the blog article when introducing benchmarks. |
Given how potentially disruptive this change would be and that people who do want to run benchmarks as tests can already use
it seems like we should probably leave things as is otherwise. Thoughts? |
Based on the discussion, this seems like a likely decline. |
I'm not sure I understand this argumeent, because it applied to I assume the only difference here is potentially making |
(emphasis mine) The conversation keeps going in the direction of "running benchmarks as tests" which is not at all what is being proposed. It's testing the benchmark, not using the benchmark to test. In other words, this proposal would still make sense to me even if the That said, it does seem like it would have a high probability of being too disruptive. I do think it's worthwhile to document that the |
No change in consensus, so declined. |
Currently, benchmarks are only run when passing the
-bench
flag togo test
. Usually, CI checks don't run benchmarks because of the extra time required and noise reducing the usefulness of the results. This leads to situations where benchmark code can rot, causing it to become a liability instead of an asset. Instead,go test
without the-bench
flag could also run all of the benchmarks for a single iteration (b.N = 1) and just report PASS or FAIL like a normal test. Then benchmark code would be exercised at least as often as test code, preventing rot.Some details to work out:
-run
? I would suggest it additionally filters the benchmarks to run when-bench
is not passed.The text was updated successfully, but these errors were encountered: