C++: IR back-edge detection based on TranslatedStmt #812

jbj · 2019-01-23T11:05:20Z

This PR implements detection of back edges in the IR based on translated statements, following up on #633 (comment). Syntactic back edges are easy to propagate to unaliased_ssa and aliased_ssa since none of the introduced nodes cause any loops. Initially I tried to implement the detection with an overridable predicate on TranslatedElement, but it turns out the back edges aren't tied to particular element types in any useful way; for example, the back edge in while (x) { x--; } doesn't touch the while-loop at all but goes directly from -- to (x).

@rdmarsh2 or @aschackmull, are you able to write a test case that shows a difference in range analysis results with this PR? I suppose it'll involve unstructured goto.

I added sanity queries in Instruction.qll to test the two properties that I think back edges should have: the CFG should have no loops when they are removed, and removing them should not cause CFG nodes to become unreachable.

The sanity query containsLoopOfForwardEdges has results on seven functions in ChakraCore, but that's independent of this PR; see #811.

By using this new definition of back edges, the range analysis should work on code that uses unstructured `goto`s.

rdmarsh2

I'd like to see some direct tests for this. I'll also add a test to the range analysis that demonstrates an improvement; do you want that as a separate PR to merge first and then rebase onto, or as a new commit in this one?

cpp/ql/src/semmle/code/cpp/ir/implementation/raw/internal/IRConstruction.qll

aschackmull · 2019-01-24T10:41:20Z

cpp/ql/src/semmle/code/cpp/rangeanalysis/RangeUtils.qll

I believe you also need to delete the isReducibleCFG(i.getFunction() conjunct in RangeAnalysis.qll line 563 in order for this to be fully applied.

See #633 (comment)

Done. That made the test produce a new (good) result, so it looks like it worked.

jbj · 2019-01-25T08:22:55Z

How bad is it for the range analysis if there's a loop left in the CFG after all the back edges have been removed? This can happen when there's a bug somewhere, like in the extractor or IR translation. See, for example, #811.

I could wrap the back-edge detection in a "back stop" predicate that classifies every edge as a back edge when there's a loop in the graph with syntactic back edges removed.

This adds one new test result (`i >= 0` on line 130).

kevinbackhouse · 2019-01-25T08:45:25Z

Sorry, I only just started looking at this so I haven't figured out how the algorithm here works yet. But surely it should be easy to avoid the situation that @jbj describes? I think the algorithm that I used in #639 is immune from that problem. The idea is to assign every node in the graph an integer. An edge is a back-edge if the number of the destination is <= the number of the source. It is impossible to create a cycle without at least one edge that's like that. So removing all the back-edges is guaranteed to remove all the cycles from the graph. Of course if you choose the numbers badly then an unnecessarily large number of edges will get classified as back-edges. But it will still work.

aschackmull · 2019-01-25T08:46:30Z

How bad is it for the range analysis if there's a loop left in the CFG after all the back edges have been removed?

Then the range analysis might go into an infinite loop.

We had an existing `Location.isBefore` predicate that was just right for this use case. Performance is great thanks to magic.

jbj · 2019-01-25T09:27:20Z

@kevinbackhouse you're right that the algorithm in #639 should be immune to such problems, but I don't think the algorithm applies to the IR directly. The IR isn't a tree but a graph, so numbering the nodes is harder. Dave has made some heroic efforts to number nodes in the IR pretty-printer, but it's too slow to be used in production. In contrast, the code I'm adding here is blazing fast because it only visits nodes that are directly involved in loop statements.

This prevents loops of non-back-edges on ChakraCore (see github#811).

jbj · 2019-01-25T12:05:28Z

I implemented the detection of left-over loops, so now it's guaranteed that there are no loops among the non-back-edges. The implementation became a bit more complicated than I thought it would be because I had to do it at the basic-block level for the sake of performance.

jbj · 2019-01-25T12:07:38Z

@rdmarsh2 if you point me to a branch with a test where this PR makes a difference, then I'll cherry-pick your commit into this PR such that the .expected change can be seen in the commit history.

This test shows that the back-edge detection does not properly account for chi nodes in the translation to aliased SSA.

jbj · 2019-01-25T14:43:06Z

On advice from @aschackmull I compared the back edges found by dominance to the back edges found syntactically. This uncovered a bug where I'd failed to handle chi nodes in SSAConstruction.qll, which caused too many back edges to be reported. I added two commits to address that: 560dbdf adds a test case to demonstrate the bug, and ba8bf94 fixes the bug.

With that fixed, the following query shows the remaining differences:

import semmle.code.cpp.ir.IR

predicate dominanceBackEdge(IRBlock b1, IRBlock b2, EdgeKind kind) {
  b1.getSuccessor(kind) = b2 and
  b2.dominates(b1)
}

from string msg, IRBlock b1, IRBlock b2, EdgeKind kind
where
  b1.isReachableFromFunctionEntry() and
  (
    msg = "only syntactic" and
    b1.getBackEdgeSuccessor(kind) = b2 and
    not dominanceBackEdge(b1, b2, kind)
    or
    msg = "only dominance" and
    not b1.getBackEdgeSuccessor(kind) = b2 and
    dominanceBackEdge(b1, b2, kind)
  )
select b1.getLastInstruction() as i1, b2.getFirstInstruction() as i2, kind, msg

On ChakraCore, the differences are now limited to goto statements and back edges inserted by the back stop. There are no "only dominance" edges.

rdmarsh2 · 2019-01-25T21:15:30Z

@jbj https://github.com/rdmarsh2/ql/tree/rdmarsh/ir-backedge adds a test with irreducible CFG and interesting bounds to the tip of this branch

jbj · 2019-01-28T08:01:14Z

The test LGTM. I've pushed it to this branch.

rdmarsh2 · 2019-01-28T17:12:31Z

The test failure is due to the changes to PrintIR.qll; would it be better to use the IRPropertyProvider infrastructure for this?

jbj · 2019-01-29T12:19:13Z

Why should IRPropertyProvider be better? Do you mean we don't always want to see which edges are back edges? If it has no value beyond testing the new predicates here, then I agree, but I was hoping it would be generally useful when reading IR since it recovers a little bit of the program structure that gets lost in translation.

rdmarsh2

That sounds good.

I think everything has been addressed at this point; once the test expectations are fixed this should be good to merge.

jbj added 3 commits January 23, 2019 11:40

C++: Initial implementation of back-edge detection

38f7ec7

C++: sanity checks for back edges

b40acce

C++: Use new back-edge def. in range analysis

bb7369e

By using this new definition of back edges, the range analysis should work on code that uses unstructured `goto`s.

jbj added the C++ label Jan 23, 2019

jbj requested review from rdmarsh2 and dave-bartolomeo January 23, 2019 11:05

jbj requested a review from a team as a code owner January 23, 2019 11:05

rdmarsh2 reviewed Jan 23, 2019

View reviewed changes

aschackmull reviewed Jan 24, 2019

View reviewed changes

C++: Enable range analysis for irreducible CFGs

6d09a9b

This adds one new test result (`i >= 0` on line 130).

C++: Simplify isStrictlyForwardGoto

3465942

We had an existing `Location.isBefore` predicate that was just right for this use case. Performance is great thanks to magic.

jbj added 2 commits January 25, 2019 11:28

C++: Fix comment (edge is not unique)

5b2b961

C++: Add a back-edge safeguard

62509ff

This prevents loops of non-back-edges on ChakraCore (see github#811).

jbj added 3 commits January 25, 2019 14:16

C++: Annotate back edges in IR debug output

9963270

C++: Test demonstrating chi node back edge bug

560dbdf

This test shows that the back-edge detection does not properly account for chi nodes in the translation to aliased SSA.

C++: Account for chi nodes in back-edge detection

ba8bf94

C++: new irreducible CFG test for range analysis

9decbd9

rdmarsh2 previously approved these changes Jan 30, 2019

View reviewed changes

C++: Accept test changes in ir_gvn.expected

b55573e

jbj dismissed rdmarsh2’s stale review via b55573e January 31, 2019 09:08

rdmarsh2 approved these changes Jan 31, 2019

View reviewed changes

rdmarsh2 merged commit 5327ca7 into github:master Jan 31, 2019

jbj mentioned this pull request Dec 12, 2019

C++: Get rid of a fastTC and noopt in IR #2519

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++: IR back-edge detection based on TranslatedStmt #812

C++: IR back-edge detection based on TranslatedStmt #812

jbj commented Jan 23, 2019

rdmarsh2 left a comment

aschackmull Jan 24, 2019

aschackmull Jan 24, 2019

jbj Jan 25, 2019

jbj commented Jan 25, 2019

kevinbackhouse commented Jan 25, 2019

aschackmull commented Jan 25, 2019

jbj commented Jan 25, 2019

jbj commented Jan 25, 2019

jbj commented Jan 25, 2019

jbj commented Jan 25, 2019

rdmarsh2 commented Jan 25, 2019

jbj commented Jan 28, 2019

rdmarsh2 commented Jan 28, 2019

jbj commented Jan 29, 2019

rdmarsh2 left a comment

C++: IR back-edge detection based on TranslatedStmt #812

C++: IR back-edge detection based on TranslatedStmt #812

Conversation

jbj commented Jan 23, 2019

rdmarsh2 left a comment

Choose a reason for hiding this comment

aschackmull Jan 24, 2019

Choose a reason for hiding this comment

aschackmull Jan 24, 2019

Choose a reason for hiding this comment

jbj Jan 25, 2019

Choose a reason for hiding this comment

jbj commented Jan 25, 2019

kevinbackhouse commented Jan 25, 2019

aschackmull commented Jan 25, 2019

jbj commented Jan 25, 2019

jbj commented Jan 25, 2019

jbj commented Jan 25, 2019

jbj commented Jan 25, 2019

rdmarsh2 commented Jan 25, 2019

jbj commented Jan 28, 2019

rdmarsh2 commented Jan 28, 2019

jbj commented Jan 29, 2019

rdmarsh2 left a comment

Choose a reason for hiding this comment