-
Notifications
You must be signed in to change notification settings - Fork 1.7k
C++: IR back-edge detection based on TranslatedStmt #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
By using this new definition of back edges, the range analysis should work on code that uses unstructured `goto`s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see some direct tests for this. I'll also add a test to the range analysis that demonstrates an improvement; do you want that as a separate PR to merge first and then rebase onto, or as a new commit in this one?
cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll
Show resolved
Hide resolved
cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll
Show resolved
Hide resolved
cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll
Outdated
Show resolved
Hide resolved
phi.getBlock() = op.getPredecessorBlock().getBackEdgeSuccessor(_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you also need to delete the isReducibleCFG(i.getFunction()
conjunct in RangeAnalysis.qll
line 563 in order for this to be fully applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #633 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. That made the test produce a new (good) result, so it looks like it worked.
How bad is it for the range analysis if there's a loop left in the CFG after all the back edges have been removed? This can happen when there's a bug somewhere, like in the extractor or IR translation. See, for example, #811. I could wrap the back-edge detection in a "back stop" predicate that classifies every edge as a back edge when there's a loop in the graph with syntactic back edges removed. |
This adds one new test result (`i >= 0` on line 130).
Sorry, I only just started looking at this so I haven't figured out how the algorithm here works yet. But surely it should be easy to avoid the situation that @jbj describes? I think the algorithm that I used in #639 is immune from that problem. The idea is to assign every node in the graph an integer. An edge is a back-edge if the number of the destination is <= the number of the source. It is impossible to create a cycle without at least one edge that's like that. So removing all the back-edges is guaranteed to remove all the cycles from the graph. Of course if you choose the numbers badly then an unnecessarily large number of edges will get classified as back-edges. But it will still work. |
Then the range analysis might go into an infinite loop. |
We had an existing `Location.isBefore` predicate that was just right for this use case. Performance is great thanks to magic.
@kevinbackhouse you're right that the algorithm in #639 should be immune to such problems, but I don't think the algorithm applies to the IR directly. The IR isn't a tree but a graph, so numbering the nodes is harder. Dave has made some heroic efforts to number nodes in the IR pretty-printer, but it's too slow to be used in production. In contrast, the code I'm adding here is blazing fast because it only visits nodes that are directly involved in loop statements. |
This prevents loops of non-back-edges on ChakraCore (see github#811).
I implemented the detection of left-over loops, so now it's guaranteed that there are no loops among the non-back-edges. The implementation became a bit more complicated than I thought it would be because I had to do it at the basic-block level for the sake of performance. |
@rdmarsh2 if you point me to a branch with a test where this PR makes a difference, then I'll cherry-pick your commit into this PR such that the |
This test shows that the back-edge detection does not properly account for chi nodes in the translation to aliased SSA.
On advice from @aschackmull I compared the back edges found by dominance to the back edges found syntactically. This uncovered a bug where I'd failed to handle chi nodes in With that fixed, the following query shows the remaining differences:
On ChakraCore, the differences are now limited to goto statements and back edges inserted by the back stop. There are no "only dominance" edges. |
@jbj https://github.com/rdmarsh2/ql/tree/rdmarsh/ir-backedge adds a test with irreducible CFG and interesting bounds to the tip of this branch |
The test LGTM. I've pushed it to this branch. |
The test failure is due to the changes to |
Why should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good.
I think everything has been addressed at this point; once the test expectations are fixed this should be good to merge.
This PR implements detection of back edges in the IR based on translated statements, following up on #633 (comment). Syntactic back edges are easy to propagate to
unaliased_ssa
andaliased_ssa
since none of the introduced nodes cause any loops. Initially I tried to implement the detection with an overridable predicate onTranslatedElement
, but it turns out the back edges aren't tied to particular element types in any useful way; for example, the back edge inwhile (x) { x--; }
doesn't touch thewhile
-loop at all but goes directly from--
to(x)
.@rdmarsh2 or @aschackmull, are you able to write a test case that shows a difference in range analysis results with this PR? I suppose it'll involve unstructured
goto
.I added sanity queries in
Instruction.qll
to test the two properties that I think back edges should have: the CFG should have no loops when they are removed, and removing them should not cause CFG nodes to become unreachable.The sanity query
containsLoopOfForwardEdges
has results on seven functions in ChakraCore, but that's independent of this PR; see #811.