Skip to content

C++: IR back-edge detection based on TranslatedStmt #812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 31, 2019

Conversation

jbj
Copy link
Contributor

@jbj jbj commented Jan 23, 2019

This PR implements detection of back edges in the IR based on translated statements, following up on #633 (comment). Syntactic back edges are easy to propagate to unaliased_ssa and aliased_ssa since none of the introduced nodes cause any loops. Initially I tried to implement the detection with an overridable predicate on TranslatedElement, but it turns out the back edges aren't tied to particular element types in any useful way; for example, the back edge in while (x) { x--; } doesn't touch the while-loop at all but goes directly from -- to (x).

@rdmarsh2 or @aschackmull, are you able to write a test case that shows a difference in range analysis results with this PR? I suppose it'll involve unstructured goto.

I added sanity queries in Instruction.qll to test the two properties that I think back edges should have: the CFG should have no loops when they are removed, and removing them should not cause CFG nodes to become unreachable.

The sanity query containsLoopOfForwardEdges has results on seven functions in ChakraCore, but that's independent of this PR; see #811.

jbj added 3 commits January 23, 2019 11:40
By using this new definition of back edges, the range analysis should
work on code that uses unstructured `goto`s.
@jbj jbj added the C++ label Jan 23, 2019
@jbj jbj requested a review from a team as a code owner January 23, 2019 11:05
Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some direct tests for this. I'll also add a test to the range analysis that demonstrates an improvement; do you want that as a separate PR to merge first and then rebase onto, or as a new commit in this one?

phi.getBlock() = op.getPredecessorBlock().getBackEdgeSuccessor(_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you also need to delete the isReducibleCFG(i.getFunction() conjunct in RangeAnalysis.qll line 563 in order for this to be fully applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. That made the test produce a new (good) result, so it looks like it worked.

@jbj
Copy link
Contributor Author

jbj commented Jan 25, 2019

How bad is it for the range analysis if there's a loop left in the CFG after all the back edges have been removed? This can happen when there's a bug somewhere, like in the extractor or IR translation. See, for example, #811.

I could wrap the back-edge detection in a "back stop" predicate that classifies every edge as a back edge when there's a loop in the graph with syntactic back edges removed.

This adds one new test result (`i >= 0` on line 130).
@kevinbackhouse
Copy link
Contributor

Sorry, I only just started looking at this so I haven't figured out how the algorithm here works yet. But surely it should be easy to avoid the situation that @jbj describes? I think the algorithm that I used in #639 is immune from that problem. The idea is to assign every node in the graph an integer. An edge is a back-edge if the number of the destination is <= the number of the source. It is impossible to create a cycle without at least one edge that's like that. So removing all the back-edges is guaranteed to remove all the cycles from the graph. Of course if you choose the numbers badly then an unnecessarily large number of edges will get classified as back-edges. But it will still work.

@aschackmull
Copy link
Contributor

How bad is it for the range analysis if there's a loop left in the CFG after all the back edges have been removed?

Then the range analysis might go into an infinite loop.

We had an existing `Location.isBefore` predicate that was just right for
this use case. Performance is great thanks to magic.
@jbj
Copy link
Contributor Author

jbj commented Jan 25, 2019

@kevinbackhouse you're right that the algorithm in #639 should be immune to such problems, but I don't think the algorithm applies to the IR directly. The IR isn't a tree but a graph, so numbering the nodes is harder. Dave has made some heroic efforts to number nodes in the IR pretty-printer, but it's too slow to be used in production. In contrast, the code I'm adding here is blazing fast because it only visits nodes that are directly involved in loop statements.

jbj added 2 commits January 25, 2019 11:28
This prevents loops of non-back-edges on ChakraCore (see github#811).
@jbj
Copy link
Contributor Author

jbj commented Jan 25, 2019

I implemented the detection of left-over loops, so now it's guaranteed that there are no loops among the non-back-edges. The implementation became a bit more complicated than I thought it would be because I had to do it at the basic-block level for the sake of performance.

@jbj
Copy link
Contributor Author

jbj commented Jan 25, 2019

@rdmarsh2 if you point me to a branch with a test where this PR makes a difference, then I'll cherry-pick your commit into this PR such that the .expected change can be seen in the commit history.

jbj added 3 commits January 25, 2019 14:16
This test shows that the back-edge detection does not properly account
for chi nodes in the translation to aliased SSA.
@jbj
Copy link
Contributor Author

jbj commented Jan 25, 2019

On advice from @aschackmull I compared the back edges found by dominance to the back edges found syntactically. This uncovered a bug where I'd failed to handle chi nodes in SSAConstruction.qll, which caused too many back edges to be reported. I added two commits to address that: 560dbdf adds a test case to demonstrate the bug, and ba8bf94 fixes the bug.

With that fixed, the following query shows the remaining differences:

import semmle.code.cpp.ir.IR

predicate dominanceBackEdge(IRBlock b1, IRBlock b2, EdgeKind kind) {
  b1.getSuccessor(kind) = b2 and
  b2.dominates(b1)
}

from string msg, IRBlock b1, IRBlock b2, EdgeKind kind
where
  b1.isReachableFromFunctionEntry() and
  (
    msg = "only syntactic" and
    b1.getBackEdgeSuccessor(kind) = b2 and
    not dominanceBackEdge(b1, b2, kind)
    or
    msg = "only dominance" and
    not b1.getBackEdgeSuccessor(kind) = b2 and
    dominanceBackEdge(b1, b2, kind)
  )
select b1.getLastInstruction() as i1, b2.getFirstInstruction() as i2, kind, msg

On ChakraCore, the differences are now limited to goto statements and back edges inserted by the back stop. There are no "only dominance" edges.

@rdmarsh2
Copy link
Contributor

@jbj https://github.com/rdmarsh2/ql/tree/rdmarsh/ir-backedge adds a test with irreducible CFG and interesting bounds to the tip of this branch

@jbj
Copy link
Contributor Author

jbj commented Jan 28, 2019

The test LGTM. I've pushed it to this branch.

@rdmarsh2
Copy link
Contributor

The test failure is due to the changes to PrintIR.qll; would it be better to use the IRPropertyProvider infrastructure for this?

@jbj
Copy link
Contributor Author

jbj commented Jan 29, 2019

Why should IRPropertyProvider be better? Do you mean we don't always want to see which edges are back edges? If it has no value beyond testing the new predicates here, then I agree, but I was hoping it would be generally useful when reading IR since it recovers a little bit of the program structure that gets lost in translation.

rdmarsh2
rdmarsh2 previously approved these changes Jan 30, 2019
Copy link
Contributor

@rdmarsh2 rdmarsh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

I think everything has been addressed at this point; once the test expectations are fixed this should be good to merge.

@rdmarsh2 rdmarsh2 merged commit 5327ca7 into github:master Jan 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants