-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/gc: why did json get slower with convT2I optimization? #3787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
As a data point, http://golang.org/cl/6356053 adds an extra argument to convT2I but does not cache itab lookups. That extra argument is simply ignored. The JSONDecode benchmark shows no dramatic change, compared to the status quo before CL 6337058 "cmd/gc: cache itab lookup in convT2I" was submitted. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 5902507000 5918781000 +0.28% BenchmarkFannkuch11 3926283000 3936780000 +0.27% BenchmarkGobDecode 22264160 22356350 +0.41% BenchmarkGobEncode 12815560 12704780 -0.86% BenchmarkGzip 568959600 568881800 -0.01% BenchmarkGunzip 178905400 179608400 +0.39% BenchmarkJSONEncode 89589000 89617050 +0.03% BenchmarkJSONDecode 310999600 316518000 +1.77% BenchmarkMandelbrot200 7016514 7018210 +0.02% BenchmarkParse 7865130 7985445 +1.53% BenchmarkRevcomp 1284725000 1254541000 -2.35% BenchmarkTemplate 567248200 571401600 +0.73% |
It might be interesting to see if introducing the global symbol has any effect on the benchmarks. Dave Cheney did some benchmarks for linux/arm, which doesn't have the performance regression. That makes me think that there is some sort of alignment or size issue. Maybe there is value to running these benchmarks under 386, since it has the same word size. |
JSONDecode program times in seconds, 5 iterations, with and sans itab cache, on my desktop. amd64 with/sans: 2.520 / 1.980; ratio = 1.27 386 with/sans: 3.550 / 3.770; ratio = 0.94 The "with itab cache" path avoids the modulus calculation in hashing. http://golang.org/cl/6256062/ suggests that the soft mod is relatively expensive on ARM. |
Another data point: there may be confounding factors, but with itab cache is faster than sans itab cache when decoding a deep JSON tree. With is slower than sans when decoding a broad JSON tree. With: BenchmarkJSONDecodeBroad 500 4188660 ns/op 5.74 MB/s BenchmarkJSONDecodeDeep 500 6037996 ns/op 3.49 MB/s Sans: BenchmarkJSONDecodeBroad 500 3925550 ns/op 6.13 MB/s BenchmarkJSONDecodeDeep 500 7444352 ns/op 2.83 MB/s GOARCH=amd64. example_test.go attached. Attachments:
|
> The "with itab cache" path avoids the modulus calculation in hashing. http://golang.org/cl/6256062/ suggests that the soft mod is relatively expensive on ARM. It sure is. Simple changes like making the itab cache 1024 elements wide, thus avoiding soft mod, give substantial speedups. |
Another data point (linux/amd64 desktop). Starting from revision c533f48701cb, which is the one before "cmd/gc: cache itab lookup in convT2I" http://code.google.com/p/go/source/detail?r=ef432c94ce9c6ed6a0c343ec0a8124237f81a522 This innocuous looking diff (which introduces no new global symbols): ---------------- $ hg identify c533f48701cb+ $ hg diff -U 5 diff -r c533f48701cb test/bench/go1/json_test.go --- a/test/bench/go1/json_test.go Mon Jul 02 15:30:00 2012 -0700 +++ b/test/bench/go1/json_test.go Mon Jul 09 14:50:03 2012 +1000 @@ -82,6 +82,9 @@ func BenchmarkJSONDecode(b *testing.B) { b.SetBytes(int64(len(jsonbytes))) for i := 0; i < b.N; i++ { jsondec() } + if false { + print([]int(nil)) + } } ---------------- has this effect on the benchmark: ---------------- benchmark old ns/op new ns/op delta BenchmarkJSONDecode 312624800 403244800 +28.99% ---------------- The generated code for func BenchmarkJSONDecode is identical (the if-false is removed by dead code elimination) except for the stack requirement. Note the $8 versus $24 in the "6g -S" before/after: 0230 (json_test.go:82) TEXT BenchmarkJSONDecode+0(SB),$8-8 0230 (json_test.go:82) TEXT BenchmarkJSONDecode+0(SB),$24-8 |
@nigeltao, very interesting diagnosis. Ultimately this sounds like bad fortune caused the parts of the inner loop of the json decoder to straddle a stack boundary. Would you suggest that there is a bug in the dead code elimination logic, in that it is not properly adjusting the stack accounting ? |
On Monday, July 9, 2012, dfc wrote: Very probable. The stack frame issue is possibly issue #2734. |
Should get fixed with contiguous stacks, maybe for Go 1.3. See the attached graph. Labels changed: added go1.3, removed go1.2maybe. Owner changed to @randall77. Attachments:
|
Yes, it should be as of https://golang.org/cl/69620043 Status changed to Fixed. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Labels
The text was updated successfully, but these errors were encountered: