Skip to content

test: concurrency a bit too aggressive #20141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
josharian opened this issue Apr 26, 2017 · 10 comments
Closed

test: concurrency a bit too aggressive #20141

josharian opened this issue Apr 26, 2017 · 10 comments

Comments

@josharian
Copy link
Contributor

When I run all.bash on my laptop, and it gets to the tests in test dir, my laptop becomes unusable. Maxing out all CPUs might be good on a server, but it is a bummer when there's something else you want to do at the same time.

A bit of experimenting suggests that scaling back the concurrency does not cost much real time, but does reduce user CPU and help with responsiveness. On my 8 core laptop:

$ time go run run.go -n 8
real	0m51.173s
user	3m2.545s
sys	2m3.047s
$ time go run run.go -n 6
real	0m51.102s
user	2m42.923s
sys	1m45.525s
$ time go run run.go -n 4
real	1m9.446s
user	2m32.984s
sys	1m39.487s

Ideally we'd reduce parallelism a bit when running on a laptop but not on a server. Or something like that. I'm not sure how to detect this automatically, though, or how much to reduce.

One easy fix would be an environment variable that I could set and forget. What do you think, @bradfitz?

@bradfitz
Copy link
Contributor

Does your OS handle "nice" nicely?

@josharian
Copy link
Contributor Author

Presumably? It's a bog standard Apple laptop. Do you have in mind nice-ing down for the test dir?

@bradfitz
Copy link
Contributor

No, I meant you can fix this problem with "nice", presumably. It's like its raison d'être.

@josharian
Copy link
Contributor Author

Right. Will experiment.

@bradfitz
Copy link
Contributor

bradfitz commented Apr 26, 2017

Btw, I don't intend to be snarky about this. Absent any magical detection of...

  • user lazily building while browsing the web
  • user impatiently waiting for all.bash
  • builder

... I just don't see the value in adding new flags or environment variables to control fast-vs-responsive when there's already a reasonable mechanism to tell your operating system your desires.

And arguably your operating system should be responsive all the time anyway.

Now, if we were hitting memory limits and getting killed or something without an easy workaround, that might be more compelling.

I'm curious to hear your results with nice, though. If it makes make.bash too slow and/or doesn't help responsiveness then we can discuss options. Instead of adding anything new, though, I'd just be slightly nicer by default for everybody, but check $GO_BUILDER_NAME (already set by builders) and be more aggressive if non-empty.

@josharian
Copy link
Contributor Author

I've been running all.bash nice'd for a day and still struggle with a sluggish laptop while the test dir is running. My definition of sluggish is: Slow enough that I am unable to work productively.

@josharian
Copy link
Contributor Author

Possibly related: #17969.

@ALTree
Copy link
Member

ALTree commented May 2, 2017

Excessive concurrency for the tests in the test folder may also be behind flakes like #18589 (see example #18589 (comment)).

@josharian
Copy link
Contributor Author

Took another look at this, and found something useful at last.

(1) My experiments with changing GOMAXPROCS were being foiled by the line in cmd/dist/test.go that explicitly overrides the ambient GOMAXPROCS:

		cmd.Env = append(os.Environ(), "GOOS="+t.gohostos, "GOARCH="+t.gohostarch, "GOMAXPROCS=")

(2) We are running 5 test runners in parallel! See CL 18199. This explains why just running go run run.go never really reproduced the full issue. Each test runner spins up NumCPU tests in parallel--so on my laptop, 40 Go processes each at full GOMAXPROCS. That sounds a lot like #17969.

I think we should revert CL 18199 and let the test dir runner handle its own processes. And if we don't revert CL 18199, I want to add an environment variable to control the number of test dir runner shards and also eliminate the unsetting of GOMAXPROCS (which was also introduced in CL 18199).

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/42431 mentions this issue.

josharian added a commit to josharian/go that referenced this issue May 2, 2017
5 shards, each of which spins up NumCPU processes,
each of which is running at GOMAXPROCS=NumCPU,
is too much for one machine. It makes my laptop unusable.

It might also be in part responsible for test flakes
that require a moderately responsive system,
like golang#18589 (backedge scheduling) and golang#19276 (locklinear).

It's possible that Go should be a better neighbor in general;
that's golang#17969. In the meantime, fix this corner of the world.

Builders snapshot the world and run shards on different
machines, so keeping sharding high for them is good.

This is a partial reversion of CL 18199.

Fixes golang#20141.

Change-Id: I123cf9436f4f4da3550372896265c38117b78071
@golang golang locked and limited conversation to collaborators May 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants