pascal-lab/Tai-e

Different ICFGs will lead to different results

Closed this issue · 6 comments

Describe the bug

Hello everyone. The ICFG was built in different kinds of traversal sequences with the same code. It may cause different analysis results that are hard to debug. How could I create a fixed ICFG?

Tai-e arguments

-cp=E:\Java project repositories\ConfigACE\workspace\test\test.jar
-m=org.apache.hadoop.hdfs.server.namenode.TestNode
-a=myprogram
-a=inter-taint
-ap

Runtime environment infomation

  • OS
    Windows 11 x64
  • java version
    java 17.0.4
  • Tai-e version
    latest

Are the CFGs in ICFG built with random orders?

In Tai-e, ICFG construction uses unordered collections (e.g., java.util.HashSet), so that traversals are unordered. Our practice tells that the correctness of many algorithms is usually orthogonal to the traversal sequence.

I agree with you that the same traversal sequence can help bug reproduction and debug.

Currently, one workaround to solve your issue is to modify the source code of pascal.taie.analysis.graph.icfg.DefaultICFG, particularly its methods in which return value is a collection, so that they return ordered collection.

I give you some demos below (updated according to #12 (comment)):

    @Override
    public Set<ICFGEdge<Stmt>> getInEdgesOf(Stmt stmt) {
-        return inEdges.get(stmt);
+        var results = new TreeSet<ICFGEdge<Stmt>>((e1, e2) -> {
+            Stmt s1 = e1.getSource();
+            Stmt s2 = e2.getSource();
+            JMethod m1 = getContainingMethodOf(s1);
+            JMethod m2 = getContainingMethodOf(s2);
+            if (!m1.equals(m2)) {
+                return m1.toString().compareTo(m2.toString());
+            }
+            return Integer.compare(stmtToCFG.get(s1).getIndex(s1), stmtToCFG.get(s2).getIndex(s2));
+        });
+        results.addAll(this.inEdges.get(stmt));
+        return results;
    }

    @Override
    public Set<ICFGEdge<Stmt>> getOutEdgesOf(Stmt stmt) {
-        return outEdges.get(stmt);
+        var results = new TreeSet<ICFGEdge<Stmt>>((e1, e2) -> {
+            Stmt s1 = e1.getTarget();
+            Stmt s2 = e2.getTarget();
+            JMethod m1 = getContainingMethodOf(s1);
+            JMethod m2 = getContainingMethodOf(s2);
+            if (!m1.equals(m2)) {
+                return m1.toString().compareTo(m2.toString());
+            }
+            return Integer.compare(stmtToCFG.get(s1).getIndex(s1), stmtToCFG.get(s2).getIndex(s2));
+        });
+        results.addAll(this.outEdges.get(stmt));
+        return results;
    }

    @Override
    public Set<Stmt> getNodes() {
-        return Collections.unmodifiableSet(stmtToCFG.keySet());
+        var results = new TreeSet<Stmt>((s1, s2) -> {
+            JMethod m1 = getContainingMethodOf(s1);
+            JMethod m2 = getContainingMethodOf(s2);
+            if (!m1.equals(m2)) {
+                return m1.toString().compareTo(m2.toString());
+            }
+            return Integer.compare(stmtToCFG.get(s1).getIndex(s1), stmtToCFG.get(s2).getIndex(s2));
+        });
+        results.addAll(stmtToCFG.keySet());
+        return results;
    }

That's really helpful. Thank you very much!

This method will lose some statements. I find out that it is caused by the statement nop both at the entry and the exit of the CFG, whose index and line number are -1.

Sorry for this mistake.

I have just forgotten the method pascal.taie.analysis.graph.cfg.StmtCFG#getIndex which will take care of that:

@Override
public int getIndex(Stmt stmt) {
if (isEntry(stmt)) {
return 0;
} else if (isExit(stmt)) {
return ir.getStmts().size() + 1;
} else {
return stmt.getIndex() + 1;
}
}

To replace s1.getIndex() with stmtToCFG.get(s1).getIndex(s1) (same for s2).

To replace s1.getIndex() with stmtToCFG.get(s1).getIndex(s1) (same for s2).

It does work, sincerely thanks!