-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: ssa rewrite x-(x&y) => x&^y #52665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR (HEAD: 36dddc1) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/403654 to see it. Tip: You can toggle comments from me using the |
This allows to rewrite: AND BX, AX SUB AX, BX Into (with GOAMD64=v3): ANDN AX, BX, AX Which is twice as fast, and smaller. Into (without GOAMD64=v3): NOT BX AND AX, BX Which if AX's depency isn't resolved yet but BX is in most cases twice as fast because now the NOT BX and AX's depency can execute in parallel. In other cases, it has negligeable positive impact. Other instructions set will also benefits from the pipelining adventage. And also benefit from the smaller ANDN instruction if they have something similar. Help reduce ChaCha20 and Poly1305 to golang#52563
This PR (HEAD: a01fd33) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/403654 to see it. Tip: You can toggle comments from me using the |
Message from Keith Randall: Patch Set 2: (1 comment) Please don’t reply on this GitHub thread. Visit golang.org/cl/403654. |
Are you sure the separate rule for amd64 is needed? It looks redundant given the generic rule. NOT(L|Q) isn't generated for much else than Com(8|16|32|64). Also, ChaPoly has constant y, so I expect that would be an AND(L|Q)const by the time that rule is checked. |
Good point, this is a remanent from my first patch which was:
To avoid generating extra instructions where we wouldn't need them.
I don't understand that, I do not emit or match any
That why the generic rule exists. 🙃 |
Never mind, brain fart. |
This is premature, should be done once the BCE issue is fixed first and we can see a real improvement in ChaChaPoly. |
This allows to rewrite:
AND BX, AX
SUB AX, BX
Into (with GOAMD64=v3):
ANDN AX, BX, AX
Which is twice as fast, and smaller.
Into (without GOAMD64=v3):
NOT BX
AND AX, BX
Which if AX's depency isn't resolved yet but BX is in most cases twice as
fast because now the NOT BX and AX's depency can execute in parallel.
In other cases, it has negligeable positive impact.
Other instructions set will also benefits from the pipelining adventage.
And also benefit from the smaller ANDN instruction if they have something
similar.
Help reduce ChaCha20 and Poly1305 to #52563