-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: auto-generated code pathologically slow to compile #16407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Quick code link for those curious: |
Haven't dug much, but I see at least two issues here. (1) The sparse phi locator is slow on this. Determined by sending SIGQUIT a few times after letting the compilation run for a while. Running with (2) Memory blowup in regalloc during liveness computation. There are lots of values live across lots of blocks, leading to O(n^2) behavior. That's what causes the machine instability. There might be some cheap additional heuristics we can use for selecting whether to use the sparse phi locator. That'd probably be safe for 1.7. The quadratic memory usage in liveness is a general, important, known problem, and way out of scope for 1.7. (David, might an SCC-based representation of liveness help here?) |
This will also affect programs using https://github.com/opennota/re2dfa or www.complang.org/ragel/ to build custom DFAs for matching regexps. |
@rogpeppe I assume you are perfectly capable of finding workarounds as needed in the meantime (including using 1.6 or turning off ssa). But I will say that I would consider converting some of the innermost switch statements to lookup tables. This should compile faster, run faster, and probably produce a smaller binary, particularly if you shift all array indices so that the smallest interesting value is 0. This might require a non-trivial transform, though, to indicate e.g. (lines 495-594) a new value for s and a new value for m and whether to check i and pb before modifying m. |
@josharian It's not my code - I have nothing to do with it or the sequencer command that imports it. I just encountered the issue when trying out that command. |
@josharian But to add to your thoughts, I wonder if just using goto instead of a |
Regalloc is dying because there are ~30000 integer constants, and they are all live during all blocks in the program. We put them all in the first block, and they never get moved down by the tighten pass because their only use is in a phi. |
CL https://golang.org/cl/25046 mentions this issue. |
I'd reckon this is a duplicate of or related to #14934. |
It's kinda pathological for the sparse phi locator, too. One of the "this isn't usually large" assumptions isn't true. I'm working on that. |
Looks like the phi locator takes ~20 minutes to complete on this example. |
Changes genfsm.go to divide the state space into chunks of size 256, then call a helper for function for each chunk (as opposed to having a single giant function that does everything). This reduces the compile time for go1.7 down to something reasonable (15-30 seconds depending on machine speed). See related issues zhenjl#1 golang/go#16407
I created a pull request [as "learn to program in go" exercise] for the fsm generator that divides the state space into chunks, then puts the chunks in separate functions. Each of the new functions is a couple thousand lines long. Bring the total compilation time down to 10-15 seconds. |
This is not showing up here, but this CL contains two fixes (one that just missed the first 1.7 deadline, the other created earlier this week) that cut the phi location time from(on my laptop) 12+ minutes down to 2, and also cuts the memory footprint of that phase down to about 500MB. |
With the sparse phi locator off, though, the SSA construction completes in a few seconds, not 2 minutes. It seems like a better fix here might be updated heuristics about when to use the sparse phi locator, if such heuristics are available. CL 23136 should probably also go in, but maybe for 1.8 instead? I'm not sure. |
If I knew of such heuristics, I would use them. I haven't yet figured out what makes this particular flow-graph so special. It's big. |
entry: x = MOVQconst [7] ... b1: goto b2 b2: v = Phi(x, y, z) Transform that program to: entry: ... b1: x = MOVQconst [7] goto b2 b2: v = Phi(x, y, z) This CL moves constant-generating instructions used by a phi to the appropriate immediate predecessor of the phi's block. We used to put all constants in the entry block. Unfortunately, in large functions we have lots of constants at the start of the function, all of which are used by lots of phis throughout the function. This leads to the constants being live through most of the function (especially if there is an outer loop). That's an O(n^2) problem. Note that most of the non-phi uses of constants have already been folded into instructions (ADDQconst, MOVQstoreconst, etc.). This CL may be generally useful for other instances of compiler slowness, I'll have to check. It may cause some programs to run slower, but probably not by much, as rematerializeable values like these constants are allocated late (not at their originally scheduled location) anyway. This CL is definitely a minimal change that can be considered for 1.7. We probably want to do a better job in the tighten pass generally, not just for phi args. Leaving that for 1.8. Update #16407 Change-Id: If112a8883b4ef172b2f37dea13e44bda9346c342 Reviewed-on: https://go-review.googlesource.com/25046 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Josh Bleecher Snyder <[email protected]>
This is: (1) a simple trick that cuts the number of phi-nodes (temporarily) inserted into the ssa representation by a factor of 10, and can cut the user time to compile tricky inputs like gogo/protobuf tests from 13 user minutes to 9.5, and memory allocation from 3.4GB to 2.4GB. (2) a fix to sparse lookup, that does not rely on an assumption proven false by at least one pathological input "etldlen". These two changes fix unrelated compiler performance bugs, both necessary to obtain good performance compiling etldlen. Without them it takes 20 minutes or longer, with them it completes in 2 minutes, without a gigantic memory footprint. Updates #16407 Change-Id: Iaa8aaa8c706858b3d49de1c4865a7fd79e6f4ff7 Reviewed-on: https://go-review.googlesource.com/23136 Reviewed-by: Keith Randall <[email protected]> Run-TryBot: David Chase <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
Looks like 2m is as fast as this is going to get for 1.7. Moving to 1.8. |
Initially reported under #17127 the blevesearch/segment package appears to also illustrate the problem, perhaps even more severely. I've just tested with tip:
And the following compilation runs for over 20 minutes (still going):
|
@mschoch : your example compiles for me in ~8 minutes. Most of the time (~7 min) is lowered CSE. |
@randall77 are there new commits on tip related to this? I let it go for 45 minutes and it never finished. I also with |
@mschoch : I've been experimenting on CL 30163 which changes the way phis are inserted. It takes 8 minutes there. I wasn't expecting big differences with tip as both tip and that CL take about the same amount of time for phi building on your example. However, it looks like tip is going to take much longer (still running). I don't understand why, as the phi building takes about the same amount of time. For some reason, the differences in phi building make later phases take longer. |
@rogpeppe 's example seems to be fixed with https://go-review.googlesource.com/c/30163/ . Compile time down to ~30sec (most of that still in phi building). |
I'm going to reopen #17127. That slow compile appears to be mostly CSE time, whereas this issue is mostly phi building time. |
CL https://golang.org/cl/30163 mentions this issue. |
Please answer these questions before submitting your issue. Thanks!
go version
)?go version go1.7rc1 linux/amd64
go env
)?amd64, linux
go get github.com/zhenjl/xparse/etld
A response in reasonable time.
No response - the code takes a very long time to compile.
Under Go 1.4, the package takes ~2s to install. Under Go 1.6, it took ~4s.
Using Go1.7rc1 I killed the compiler after 9 minutes because my machine was
overheating and becoming unusable.
It's a large state machine, but this doesn't seem like reasonable compiler behaviour.
FWIW this package is used as part of the
sequencer
tool (see http://zhen.org/blog/sequence-high-performance-sequential-semantic-log--parser/)The text was updated successfully, but these errors were encountered: