-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: get rid of SSA call block type #15631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This means one big block full of calls instead of many little CallBlocks? Another thing Quentin just noticed (looking at giant machine-generated protobuf files) is that memory barrier ops introduce branches which generally causes sadness around phi location and generation; if we solve the rewrite-to-multiple Blocks issue, this might allow us to do general phi placement first on much larger blocks, then expand these afterwards. One minor issue is the cost of splitting variables between Blocks and reassigning them; we'd want to do this in an order that avoids reassigning to new blocks variables as we split blocks again and again at write barriers. |
Right. |
I did a little research on #15537, using the improved-version of sparse (that generates the cleanest code, hence slightly less repulsive to humans, and a svelte 8.8MB dump after the build phase). Crudely, after building SSA, the largest function 222053 lines After early deadcode: After flagalloc (note blocks added to remove critical edges): Merging Call blocks with their successors should cut the number of blocks by about 1/3, and also cut the number of Copy nodes out of the phi assignment phase. This seems like a good thing. Given how costly register allocation is in space and time, would it make sense to run a Value compaction pass just before? |
Yes, if value number compaction would help we can do that. |
I think value number compaction will help a lot with our trouble case. Normal ID utilization at entry to regalloc is normally 30–60%; for the trouble case, it is 0.6%, and that translates into giant allocs. I started on a number compaction optimization a couple of days ago but ran into some bumps (free lists, constant caches) and decided to see whether I could avoid generating so many values to begin with. I found a few minor optimizations (worth a couple of percent of total allocs) but nothing worth mailing during the freeze. I have one idea left to try before moving on, will know about that soon. If no one has started on it, I'm happy to finish off ID compaction. Please let me know one way or the other. |
Value number compaction looks like it might be worth doing. tip: tip+sparse: tip+sparse+tweak_for_1.8: tip+value_compact: tip+sparse+value_compact: tip+sparse+tweak_for_1.8+value_compact: Obviously, I have a value-compacter written. It will mess with debugging in ssa.html, so we might only want to enable it when the number of values is very large, or perhaps if we can determine ahead of time that it will win big. Note that it still helps a little even with tip+sparse+tweak_for_1.8, but that tweak leaves the values alone. The tweaked version avoids creating a whole lot of phi functions that then gets eliminated; Call merging will avoid creating a whole bunch of Copy that then gets eliminated. Long run, value compaction might not be that helpful, but in the short run it looks safe and effective. |
Looks like we are entirely on the same page. Will you go ahead and mail your ID compactor so that I can take a look? We should be able to cheaply decide to only compact if f.NumValues > some threshold and % utilization < some threshold, but it might always be worth doing, at least until other improvements obviate it. And yes, we will definitely want some way to disable it entirely via a flag for debugging. Given the subtlety of sparse phi and other mucking around, it's possible we should just do ID compaction for 1.8; not sure it that gets us far enough. |
I'm going to make a CL of it as soon as it passes tests locally. I don't think the sparse phi stuff is that subtle; it's tree numbering games for the dominators and balanced-tree data structures, and it helps avoid renumbering completely. If there's places it needs more explanation, I can work on that. |
With regard to the original content of this issue (getting rid of SSA Call block type), SGTM. I have a handful of optimizations sitting in my local tree that will be impacted by it, so ideally this would happen early in the 1.8 cycle, so that I can rebase and remeasure. |
CL https://golang.org/cl/28950 mentions this issue. |
The SSA Call block is a vestige of when we had explicit panic edges in the call graph. Now that panic edges are implicit, we don't really need a call block - call Values take a memory and produce a memory, and that is enough to get all the memory ordering correct (even if the call panics).
The text was updated successfully, but these errors were encountered: