-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: possible latent codegen issue on amd64 removing zero extensions #36897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've never been super happy about our whole sign extension story. In machine-independent space I think we're good, but we play very fast and loose in the machine-dependent space. Maybe there's some invariant we can use where we guarantee to only use the bits of a value that exist in its type. So there would be two effective |
Agreed. And at the same time, we also fail to remove a bunch of extraneous extensions That approach sounds good to me, assuming the details work out ok. It could cause an explosion in number of ops and rules (which is already a problem), and might be tricky with shifts. I don't think I want to sign up for tackling this one. :) |
Same for me. I'm also not super happy about the extension removing.
I think we should only do this if the type is 32-bit wide or smaller. But how good are we keeping the types accurate?
The LHS is 64-bit wide, whereas the RHS is 32-bit wide. This kind of rules have always troubled me. Maybe we could have a way of doing this in a type-preserving way? Something like |
Change https://golang.org/cl/220499 mentions this issue: |
The SSA backend has rules to read the contents of readonly Lsyms. However, this rule was failing to trigger for many readonly Lsyms. This is because the readonly attribute that was set on the Node.Name was not propagated to its Lsym until the dump globals phase, after SSA runs. To work around this phase ordering problem, introduce Node.SetReadonly, which sets Node.Name.Readonly and also configures the Lsym enough that SSA can use it. This change also fixes a latent problem in the rewrite rule function, namely that reads past the end of lsym.P were treated as entirely zero, instead of merely requiring padding with trailing zeros. This change also adds an amd64 rule needed to fully optimize the results of this change. It would be better not to need this, but the zero extension that should handle this for us gets optimized away too soon (see #36897 for a similar problem). I have not investigated whether other platforms also need new rules to take full advantage of the new optimizations. Compiled code for (interface{})(true) on amd64 goes from: LEAQ type.bool(SB), AX MOVBLZX ""..stmp_0(SB), BX LEAQ runtime.staticbytes(SB), CX ADDQ CX, BX to LEAQ type.bool(SB), AX LEAQ runtime.staticbytes+1(SB), BX Prior to this change, the readonly symbol rewrite rules fired a total of 884 times during make.bash. Afterwards they fire 1807 times. file before after Δ % cgo 4827832 4823736 -4096 -0.085% compile 24907768 24895656 -12112 -0.049% fix 3376952 3368760 -8192 -0.243% pprof 14751700 14747604 -4096 -0.028% total 120343528 120315032 -28496 -0.024% Change-Id: I59ea52138276c37840f69e30fb109fd376d579ec Reviewed-on: https://go-review.googlesource.com/c/go/+/220499 Run-TryBot: Josh Bleecher Snyder <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Keith Randall <[email protected]>
Change https://golang.org/cl/226957 mentions this issue: |
Wrap the regtest test servers to capture jsonrpc2 logs, so that they can be printed on test failure. There was a little bit of complication to implement this in the 'Shared' execution mode, and since we're not really using this mode I just deleted it. Updates golang/go#36897 Updates golang/go#37318 Change-Id: Ic0107c3f317850ae3beb760fc94ae474e647cb78 Reviewed-on: https://go-review.googlesource.com/c/tools/+/226957 Run-TryBot: Robert Findley <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Cottrell <[email protected]>
I guess we didn't do anything here for 1.15. Moving to 1.16 milestone. |
Since there's no current plan to work on this, bumping to "Unplanned" |
I'm not sure whether it is possible to trigger this bug right now, but I think there may be an issue lurking.
Consider code like:
The outmost op is
ZeroExt32to64
, which gets lowered toMOVLQZX
. The innermost op ends up being(ANDLconst [0xFFFFFFFF] x)
.Then this rule triggers:
(MOVLQZX x) && zeroUpper32Bits(x,3) -> x
.ANDLconst
is among the ops listed as zeroing the upper 32 bits of a register, so we eliminate the zero extension.Then this rule triggers:
(ANDLconst [c] x) && int32(c)==-1 -> x
, eliminating theANDLconst
.This leaves us with just
x
, but without any zero extension, which could leave junk in the top half of the register.As it stands, this doesn't happen in this function, because the And32 is eliminated during generic optimization. But there are ways to sneak an
& -1
past generic optimization, e.g. with a constant shift, which is how I discovered this issue. We are also saved in this case by using a MOVL to load the value from an arg. But we could be using a computed value instead. So I haven't convinced myself that this couldn't actually cause bad code generation right now for just the right input.There are two possible diagnoses/fixes:
I'm strongly inclined to the first diagnosis and fix. Thoughts?
cc @randall77 @cherrymui
The text was updated successfully, but these errors were encountered: