-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: move instrumention into an ssa pass? #19054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, it makes sense. It should make the pass considerably simpler and hopefully eliminate all the corner cases that we still continue to hit episodically. It may also help to implement some of the goodness we have in llvm. IIRC that PointerMayBeCaptured eliminated ~1/3 of instrumentation. Also deduplication at least within the same basic block: void ThreadSanitizer::chooseInstructionsToInstrument(
SmallVectorImpl<Instruction *> &Local, SmallVectorImpl<Instruction *> &All,
const DataLayout &DL) {
SmallSet<Value*, 8> WriteTargets;
// Iterate from the end.
for (Instruction *I : reverse(Local)) {
if (StoreInst *Store = dyn_cast<StoreInst>(I)) {
Value *Addr = Store->getPointerOperand();
if (!shouldInstrumentReadWriteFromAddress(Addr))
continue;
WriteTargets.insert(Addr);
} else {
LoadInst *Load = cast<LoadInst>(I);
Value *Addr = Load->getPointerOperand();
if (!shouldInstrumentReadWriteFromAddress(Addr))
continue;
if (WriteTargets.count(Addr)) {
// We will write to this temp, so no reason to analyze the read.
NumOmittedReadsBeforeWrite++;
continue;
}
if (addrPointsToConstantData(Addr)) {
// Addr points to some constant data -- it can not race with any writes.
continue;
}
}
Value *Addr = isa<StoreInst>(*I)
? cast<StoreInst>(I)->getPointerOperand()
: cast<LoadInst>(I)->getPointerOperand();
if (isa<AllocaInst>(GetUnderlyingObject(Addr, DL)) &&
!PointerMayBeCaptured(Addr, true, true)) {
// The variable is addressable but not captured, so it cannot be
// referenced from a different thread and participate in a data race
// (see llvm/Analysis/CaptureTracking.h for details).
NumOmittedNonCaptured++;
continue;
}
All.push_back(I);
}
Local.clear();
} |
Incidentally, I was looking at this some last week. The big issue I had was whether to implement instrumentation as an SSA pass or during buildssa. Normally loads don't modify the memory variable, but when instrumented with a sanitizer function call they do. If we were to handle instrumentation as a pure SSA pass, it might need to rewrite memory variable references and insert phi nodes. I did have a semi-working CL that emitted instrumentation calls during buildssa (basically just emitted a call before each OpLoad/OpStore/OpMove/OpZero), but it produced only about 80-90% as many calls. I haven't yet investigated why. |
Implementing it like a normal pass sounds better long term. What's the problem with loads?
It may be good. First thing is to just run all race test and go test -race std. If that works, we just need to look at some cases where ssa pass does not insert instrumentation. |
If reasonable, it'd be nice to do it as an SSA pass rather than during buildssa, for encapsulation and understandability reasons--buildssa is already unwieldy, and it will only grow from here.
It might take some work to make the scheduler happy with this, but could we simply leave sanitizer function calls as non-memory-modifying? |
Basically the signatures of SSA Load and Store operations are:
SSA also requires that Memory objects be linearly threaded through the computation. (Similar to monads in Haskell.) So Stores are necessarily linearized, and Loads are freely rearranged between two Stores. Also, Calls behave a lot like Stores:
Instrumenting a Store is easy, because it amounts to just rewriting:
into:
Since Store already produced a new Memory value, introducing a new Memory temporary for the instrumentation call is a purely local rewrite. Instrumenting Load is trickier though, because if we have something like:
that is currently something like:
But as an SSA pass, we would have to rewrite it to:
|
I support moving race instrumentation to an SSA pass. I was thinking about introducing a non-memory call for side-effect free calls. This would be generically useful for a bunch of runtime support calls, like complex division. We could use a similar call for instrumenting loads. |
Naively (i.e., I still largely treat package ssa as a black box), being able to express side-effect free calls SGTM. Until we have that though, is there any opposition to moving racewalk into buildssa? My WIP CL was relatively non-intrusive after refactoring out the OpLoad/OpStore construction patterns, and I think could still be migrated to a proper SSA pass later. |
I can't decide whether this is exciting or horrifying. It might be helpful for race calls, so that they can check args/results without disrupting the original call.
Fine by me. |
I guess I'm not really for moving the code twice if we can avoid it. |
No. They just take an address and sometimes a size as input. No outputs. |
Yes, not outputs, and the calls don't touch anything that can possibly concern the compiler. The only case when the calls callback into Go world is symbolization callback during race reporting, but it runs non-instrumented runtime code. |
Too late. Moving to 1.11. |
Change https://golang.org/cl/102817 mentions this issue: |
Change https://golang.org/cl/102816 mentions this issue: |
Change https://golang.org/cl/102815 mentions this issue: |
@aclements I tried to dig up my old racewalk-during-buildssa experimental CLs, but couldn't find them. I recreated the general patch series though, and it seems to work even. :) I've randomly surveyed and manually inspected the resulting instrumented code, and it looks right. I'm a little curious why the number of race calls for cmd/go has gone down modestly (~5.6%), but the binary size has gone up somewhat (~2.4%). The change to reorder1 to spill all function arguments to temporary variables before assigning them to the parameter slots explains a little of the size growth, but much less than I expected: without that change (i.e., generating broken code) the binary size still goes up ~1.9%. |
There seems to be some nice code size reduction! Thanks! Do you know what was stripped in these ~5.6%? Can you please look at few functions and confirm that we don't remove anything useful? Is it duplicates? I think the main motivation for moving this to SSA (besides ending infinite stream of bugs) was smarter analysis of redundant and unnecessary instrumentation. One of low hanging fruits is removing dups within a basic block: |
Re binary size: I would first check that it's indeed due to .text section (e.g. readelf -a shows sections). And then use nm -S to find few functions with the largest size increase, and then just look at them. |
The binary size contribution seems to come from three sources:
1 and 2 could potentially be improved. 3 seems like it would be more involved. I looked through the removed instrumentation calls. They seemed to be mostly uses of stack variables that isartificial couldn't identify. For example, when creating non-escaping function closures on the stack, we can now omit the racewrite calls while creating it. The only place I noticed new instrumentation calls is when spilling parameters to the heap. For example:
is compiled as:
Instrumenting during SSA construction means we now see |
Re (3), won't hoisting all race instrumentation to the beginning of a basic block (or a preceding function call whichever comes first) help? Doing analysis and instrumentation on call-less parts of basic blocks will also help with #19054 (comment). Not sure if it can also help with (1). |
Why wasn't it instrumented before? |
Probably, but that sounds more involved than I have resources available at the moment. It's not trivial to reschedule function calls currently, or to insert them at arbitrary places. My more immediate concern is the instrumentation pass is blocking us from moving optimizations from order/walk into SSA construction, where we expect they can be both more powerful and easier to maintain. I think that's worth a modest code size increase for instrumented builds, as long as performance is still acceptable.
Because paramstoheap adds the assignment to fn.Func.Enter, which instrument skips over ("nothing interesting for race detector in fn->enter").
Ack, I thought about that, but we don't currently have that level of fine-grained information. Currently, we only track whether or not an object is ever published.
Yes, I think that would be good too. |
Pulling these operations out separately so it's easier to add instrumentation calls. Passes toolstash-check. Updates #19054. Change-Id: If6a537124a87bac2eceff1d66d9df5ebb3bf07be Reviewed-on: https://go-review.googlesource.com/102816 Reviewed-by: Cherry Zhang <[email protected]> Run-TryBot: Cherry Zhang <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
Rather than checking for each function whether the package supports instrumentation, check once up front. Relatedly, tweak the logic for preventing inlining calls to runtime functions from instrumented packages. Previously, we simply disallowed inlining runtime functions altogether when instrumenting. With this CL, it's only disallowed from packages that are actually being instrumented. That is, now intra-runtime calls can be inlined. Updates #19054. Change-Id: I88c97b48bf70193a8a3ee18d952dcb26b0369d55 Reviewed-on: https://go-review.googlesource.com/102815 Run-TryBot: Matthew Dempsky <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Cherry Zhang <[email protected]>
This is a naive question intended to open discussion.
race/msan instrumentation currently happens between walk and SSA generation. As a result, as we slowly migrate walk rewrites to SSA generation and/or SSA itself, racewalk will slowly fall out of sync and/or be a source of added work. And the simpler ontology of SSA might prove helpful in increasing instrumentation coverage.
Does it make sense to consider moving racewalk to a generic SSA pass? If so, opinions/considerations about implementation details?
I am not volunteering at the moment to do this work, although I did bump into this question while satisfying my curiosity about what it would be like to handle ORANGE directly during SSA generation. (Answer: It's a fair amount of work.)
cc @dvyukov @ianlancetaylor @randall77
The text was updated successfully, but these errors were encountered: