cmd/compile: excessively slow compile for 60K lines function #15112

jordancurve · 2016-04-04T20:15:22Z

The go 1.6 compiler panics on this 2.25 Mbyte source file containing a slice with 13240 elements: https://drive.google.com/file/d/0B2AF520HhpuzdzZhSVpPZGZkTUU/view

go1.6 linux/amd64
GOARCH="amd64"
GOOS="linux"

$ go build buildme.go
# command-line-arguments
panic: runtime error: makeslice: len out of range

goroutine 1 [running]:
panic(0x804f60, 0xc86085dd70)
        /usr/lib/google-golang/src/runtime/panic.go:483 +0x3ef
cmd/compile/internal/gc.newliveness(0xc82018b9e0, 0xc8200a0000, 0xc865c44000, 0x2d416, 0x2e400, 0xc866950000, 0xcee1, 0xf000, 0xc9cda4b11aac74db)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/plive.go:687 +0x161
cmd/compile/internal/gc.liveness(0xc82018b9e0, 0xc8200a0000, 0xc848dee900, 0xc848dee980)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/plive.go:1782 +0x2cf
cmd/compile/internal/gc.compile(0xc82018b9e0)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/pgen.go:541 +0xdf2
cmd/compile/internal/gc.funccompile(0xc82018b9e0)
        /usr/lib/google-golang/src/cmd/compile/internal/gc/dcl.go:1450 +0x1c0
cmd/compile/internal/gc.Main()
        /usr/lib/google-golang/src/cmd/compile/internal/gc/lex.go:472 +0x2116
cmd/compile/internal/amd64.Main()
        /usr/lib/google-golang/src/cmd/compile/internal/amd64/galign.go:127 +0x58d
main.main()
        /usr/lib/google-golang/src/cmd/compile/main.go:32 +0x395

The text was updated successfully, but these errors were encountered:

griesemer · 2016-04-04T20:29:58Z

Seems to work at tip - albeit incredibly slow:

$ time go tool compile buildme.go 

real    1m36.521s
user    1m40.571s
sys 0m1.907s

It's a huge file, but for the reference, gotype type-checks this in 0.9s.

There's some algorithmic issue here.

griesemer · 2016-04-04T20:30:54Z

@Khaaan can you confirm that this code (or whatever original you have derived this from) compiles (if extremely slow) when compiling with the latest compiler at tip?

rjammala · 2016-04-04T21:10:47Z

@griesemer It compiles for me at tip: +f229e46

$ time go tool compile buildme.go

real 1m55.632s
user 1m56.219s
sys 0m3.270s

griesemer · 2016-04-04T21:29:05Z

Thanks. There's probably some non-linearity in the backend somewhere. Adjusting title.

josharian · 2016-04-06T19:17:43Z

Looks like it's spending most of the time in CSE. (Surprise!)

cc @tzneal

josharian · 2016-04-12T15:17:30Z

Started.

dr2chase · 2016-04-12T18:19:33Z

FYI, I may be mucking around with Types, meaning, canonicalizing them down to a 32-bit integer, as part of improvements to register allocation. Not sure if this is going to matter to you for this, but just a heads-up.

gopherbot · 2016-04-13T01:00:24Z

CL https://golang.org/cl/21937 mentions this issue.

runtime.newobject never returns the same thing twice, so the resulting value will never be a common subexpression. This helps when compiling large static data structures that include pointers, such as maps and slices. No clear performance impact on other code. (See below.) For the code in issue #15112: Before: real 1m14.238s user 1m18.985s sys 0m0.787s After: real 0m47.172s user 0m52.248s sys 0m0.767s For the code in issue #15235, size 10k: Before: real 0m44.916s user 0m46.577s sys 0m0.304s After: real 0m7.703s user 0m9.041s sys 0m0.316s Still more work to be done, particularly for #15112. Updates #15112 Updates #15235 name old time/op new time/op delta Template 330ms ±11% 333ms ±13% ~ (p=0.749 n=20+19) Unicode 148ms ± 6% 152ms ± 8% ~ (p=0.072 n=18+20) GoTypes 1.01s ± 7% 1.01s ± 3% ~ (p=0.583 n=20+20) Compiler 5.04s ± 2% 5.06s ± 2% ~ (p=0.314 n=20+20) name old user-ns/op new user-ns/op delta Template 444user-ms ±11% 445user-ms ±10% ~ (p=0.738 n=20+20) Unicode 215user-ms ± 5% 218user-ms ± 5% ~ (p=0.239 n=18+18) GoTypes 1.45user-s ± 3% 1.45user-s ± 4% ~ (p=0.620 n=20+20) Compiler 7.23user-s ± 2% 7.22user-s ± 2% ~ (p=0.901 n=20+19) name old alloc/op new alloc/op delta Template 55.0MB ± 0% 55.0MB ± 0% ~ (p=0.547 n=20+20) Unicode 37.6MB ± 0% 37.6MB ± 0% ~ (p=0.301 n=20+20) GoTypes 177MB ± 0% 177MB ± 0% ~ (p=0.065 n=20+19) Compiler 798MB ± 0% 797MB ± 0% -0.05% (p=0.000 n=19+20) name old allocs/op new allocs/op delta Template 492k ± 0% 493k ± 0% +0.03% (p=0.030 n=20+20) Unicode 377k ± 0% 377k ± 0% ~ (p=0.423 n=20+19) GoTypes 1.40M ± 0% 1.40M ± 0% ~ (p=0.102 n=20+20) Compiler 5.53M ± 0% 5.53M ± 0% ~ (p=0.094 n=17+18) name old text-bytes new text-bytes delta HelloSize 561k ± 0% 561k ± 0% ~ (all samples are equal) CmdGoSize 6.13M ± 0% 6.13M ± 0% ~ (all samples are equal) name old data-bytes new data-bytes delta HelloSize 128k ± 0% 128k ± 0% ~ (all samples are equal) CmdGoSize 306k ± 0% 306k ± 0% ~ (all samples are equal) name old exe-bytes new exe-bytes delta HelloSize 905k ± 0% 905k ± 0% ~ (all samples are equal) CmdGoSize 9.64M ± 0% 9.64M ± 0% ~ (all samples are equal) Change-Id: Id774e2901d7701a3ec7a1c1d1cf1d9327a4107fc Reviewed-on: https://go-review.googlesource.com/21937 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Todd Neal <[email protected]>

josharian · 2016-04-13T04:30:54Z

CL 21961 will help a bit more (if it goes in). The bottleneck now is the O(n^2) invocation of isAncestorEq. I don't see any clear algorithmic improvements there, but perhaps @dr2chase or @tzneal does.

The only further think I can think of to try is to make isAncestorEq itself cheaper. That's not going to be easy. Perhaps we could cache the entry and exit numbers per value in the eqclass to avoid the indirect lookups in the sparse tree? But that'll cost allocations, and it's not fixing the root cause.

I don't plan to work on this further. It is considerably faster; I hope that it's improved enough that we can downgrade from Go1.7 to Unplanned.

gopherbot · 2016-04-13T05:00:26Z

CL https://golang.org/cl/21961 mentions this issue.

tzneal · 2016-04-13T12:45:06Z

I'm working on another CL that gives a 50% speedup for this issue. It just sorts each partition by isAncestorEq to remove the need to find the maximal dominator each pass through.

The other thing I was trying was to reduce the number of values created in the first place. Every reference to a PEXTERN ONAME ("moves" in this case) generates another OpAddr value which all then get removed by cse. The other large sets of duplicate values for this case are stack addresses for calling runtime.newobject.

josharian · 2016-04-13T14:43:14Z

Did you see CL 21961?

I also tried an initial sort. The primary problem is that isAncestor(Eq) provides only a partial ordering, so you fundamentally need to look at all pairs to get a complete sort. And then you need to maintain the sort order as you remove elements, which is itself expensive. That's how I ended up at CL 21961. But I'd be delighted if you see something I didn't. :)

Reducing the number of values would be a big win. If you can reduce the number of Nodes, that'd help everyone, since compile time in general is at this point roughly proportional to the number of Nodes, for non-pathological files.

I wonder whether stack addresses (or maybe a subset of them?) should be put in the constant cache.

tzneal · 2016-04-13T14:54:59Z

I saw it, but thought that the sort by dom might be cleaner. I was going to flail around a bit with it anyway :)

I saw the partial ordering issue after I started coding and then began using the entry value from sdom to provide an absolute ordering. I just finished up maintaining the sort order to see if it's worth it.

I've got a DNR CL 21981 if you wanted to take a look. I'm on a slowish Macbook so my benchmark times are a little worse in absolute terms, but it's still a ~30% improvement.

gopherbot · 2016-04-13T16:00:26Z

CL https://golang.org/cl/21981 mentions this issue.

We do two O(n) scans of all values in an eqclass when computing substitutions for CSE. In unfortunate cases, like those found in #15112, we can have a large eqclass composed of values found in blocks none of whom dominate the other. This leads to O(n^2) behavior. The elements are removed one at a time, with O(n) scans each time. This CL removes the linear scan by sorting the eqclass so that dominant values will be sorted first. As long as we also ensure we don't disturb the sort order, then we no longer need to scan for the maximally dominant value. For the code in issue #15112: Before: real 1m26.094s user 1m30.776s sys 0m1.125s Aefter: real 0m52.099s user 0m56.829s sys 0m1.092s Updates #15112 Change-Id: Ic4f8680ed172e716232436d31963209c146ef850 Reviewed-on: https://go-review.googlesource.com/21981 Reviewed-by: Josh Bleecher Snyder <[email protected]> Run-TryBot: Todd Neal <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>

Process a slice of equivalent values by setting replaced values to nil instead of removing them from the slice to eliminate copying. Also take advantage of the entry number sort to break early once we reach a value in a block that is not dominated. For the code in issue #15112: Before: real 0m52.603s user 0m56.957s sys 0m1.213s After: real 0m22.048s user 0m26.445s sys 0m0.939s Updates #15112 Change-Id: I06d9e1e1f1ad85d7fa196c5d51f0dc163907376d Reviewed-on: https://go-review.googlesource.com/22068 Reviewed-by: David Chase <[email protected]>

tzneal · 2016-04-15T15:54:51Z

@josharian - Caching stack address is harder than I thought due to the value's type in most cases being a unique instance of a Type. I started looking at caching types in Ptrto() to solve this, but the Node's type is unique there as well. The easy cases are args to runtime arguments which are all given type Types[TUINTPTR]. Past that I don't see a great way to reduce the OpOffPtr values.

dr2chase · 2016-04-15T16:14:58Z

I've got a type-interner working (based on Compare) that might help here. So far I am only using it for LocalSlot because those are sometimes too fat. Will see about making it more pervasive.

josharian · 2016-04-19T21:40:51Z

A quick profile suggests that the only unusual performance bit here is in fact type comparison, and even that's only ~13% of compile time. So: vast strides.

randall77 · 2016-04-29T04:49:24Z

I'm going to move this to 1.8, I don't think we have anything else to help here for 1.7.

josharian · 2016-05-03T19:21:34Z

@tzneal #15520 is another case in which we do a bunch of work to collapse OpSP offsets. In that case, they all have the same type, though they don't compare the same using ==. A cache of these keyed by (off, type.String) would help here, at least. (Similar to the const cache, callers would need to make sure that their type is actually CMPeq, not just string-equal.) I think we should look again at this idea for Go 1.8.

gopherbot · 2016-10-04T21:45:15Z

CL https://golang.org/cl/30354 mentions this issue.

griesemer changed the title ~~go compiler panic: "makeslice: len out of range"~~ cmd/compile: excessively slow compile for 60K lines function Apr 4, 2016

griesemer assigned randall77 Apr 4, 2016

bradfitz added this to the Go1.7 milestone Apr 7, 2016

josharian mentioned this issue Apr 12, 2016

cmd/compile: crash in newliveness in Go 1.6 #15235

Closed

josharian added the ToolSpeed label Apr 16, 2016

randall77 modified the milestones: Go1.8, Go1.7 Apr 29, 2016

josharian mentioned this issue May 3, 2016

cmd/compile: 8x increase in compile time from go1.6 to go.tip #15520

Closed

gopherbot closed this as completed in 30088ac Oct 5, 2016

golang locked and limited conversation to collaborators Oct 5, 2017

gopherbot added the FrozenDueToAge label Oct 5, 2017

rsc unassigned randall77 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: excessively slow compile for 60K lines function #15112

cmd/compile: excessively slow compile for 60K lines function #15112

jordancurve commented Apr 4, 2016

griesemer commented Apr 4, 2016

griesemer commented Apr 4, 2016

rjammala commented Apr 4, 2016

griesemer commented Apr 4, 2016

josharian commented Apr 6, 2016

josharian commented Apr 12, 2016

dr2chase commented Apr 12, 2016

gopherbot commented Apr 13, 2016

josharian commented Apr 13, 2016

gopherbot commented Apr 13, 2016

tzneal commented Apr 13, 2016

josharian commented Apr 13, 2016

tzneal commented Apr 13, 2016

gopherbot commented Apr 13, 2016

tzneal commented Apr 15, 2016

dr2chase commented Apr 15, 2016

josharian commented Apr 19, 2016

randall77 commented Apr 29, 2016

josharian commented May 3, 2016

gopherbot commented Oct 4, 2016

cmd/compile: excessively slow compile for 60K lines function #15112

cmd/compile: excessively slow compile for 60K lines function #15112

Comments

jordancurve commented Apr 4, 2016

griesemer commented Apr 4, 2016

griesemer commented Apr 4, 2016

rjammala commented Apr 4, 2016

griesemer commented Apr 4, 2016

josharian commented Apr 6, 2016

josharian commented Apr 12, 2016

dr2chase commented Apr 12, 2016

gopherbot commented Apr 13, 2016

josharian commented Apr 13, 2016

gopherbot commented Apr 13, 2016

tzneal commented Apr 13, 2016

josharian commented Apr 13, 2016

tzneal commented Apr 13, 2016

gopherbot commented Apr 13, 2016

tzneal commented Apr 15, 2016

dr2chase commented Apr 15, 2016

josharian commented Apr 19, 2016

randall77 commented Apr 29, 2016

josharian commented May 3, 2016

gopherbot commented Oct 4, 2016