pascal-lab/Tai-e

How to correctly taint "%this"?

MXWXZ opened this issue · 5 comments

Overall Description

Hi, I want to write a plugin that can taint any class when any field of the class is tainted (e.g, when a tainted variable is passed to obj.setName(String), I want the obj is tainted as well after that). However, the caller variable is not affected even through the "%this" variable in the callee function is tainted. So how to correctly handle this?

You can refer to the minimum reproduce code below that the "%this" in callee function is tainted

[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]

However the caller "%this" is untouched

[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]

I may misunderstand some parts of the pta analysis, do I need to manually propagate this (why not be handled automatically)?

Expected Behavior

The caller object is also tainted.

Current Behavior

The caller object is not tainted.

Tai-e Arguments

Click here to see Tai-e Options
optionsFile: null
printHelp: false
classPath: []
appClassPath:
- tester-1.0-SNAPSHOT.jar
mainClass: org.example.Main
inputClasses: []
javaVersion: 8
prependJVM: false
allowPhantom: true
worldBuilderClass: pascal.taie.frontend.soot.SootWorldBuilder
outputDir: output
preBuildIR: false
worldCacheMode: false
scope: ALL
nativeModel: true
planFile: null
analyses:
ir-dumper: ;
pta: "taint-config:config.yml;plugins:[pascal.taie.analysis.pta.plugin.taint.SuperTaintHandler];dump:true"
onlyGenPlan: false
keepResult:
- $KEEP-ALL

Tai-e Log

Click here to see IR log
  static void process(java.lang.String r1) {
      org.example.User $r0;
      [0@L8] $r0 = new org.example.User;
      [1@L8] invokespecial $r0.<org.example.User: void <init>()>();
      [2@L9] invokevirtual $r0.<org.example.User: void setName(java.lang.String)>(r1);
      [3@L10] invokestatic <org.example.Main: void sk(org.example.User)>($r0);
      [4@L11] return;
  }

  public void setName(java.lang.String name) {
      [0@L9] %this.<org.example.User: java.lang.String name> = name;
      [1@L10] return;
  }
Click here to see points-to results
[]:<org.example.Main: void main(java.lang.String[])>/%stringconst0 -> [[]:MergedObj{<Merged string constants>}]
[]:<org.example.Main: void main(java.lang.String[])>/r0 -> [[]:EntryPointObj{alloc=MethodParam{<org.example.Main: void main(java.lang.String[])>/0},type=java.lang.String[] in <org.example.Main: void main(java.lang.String[])>}]
[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.Main: void process(java.lang.String)>/r1 -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.Main: void sk(org.example.User)>/$r1 -> [[]:NewObj{<java.lang.System: java.io.PrintStream newPrintStream(java.io.FileOutputStream,java.lang.String)>[1@L1147] new java.io.PrintStream}, []:NewObj{<java.lang.System: java.io.PrintStream newPrintStream(java.io.FileOutputStream,java.lang.String)>[9@L1150] new java.io.PrintStream}]
[]:<org.example.Main: void sk(org.example.User)>/$r2 -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.Main: void sk(org.example.User)>/r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.User: java.lang.String getName()>/$r1 -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.User: java.lang.String getName()>/%this -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.User: void <init>()>/%this -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{<Merged string constants>}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.User: void setName(java.lang.String)>/name -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]

Additional Information

Click here to see key code of my plugin
// It may not be efficient, any suggestions to improve? 
@Override
  public void onNewCSMethod(CSMethod csMethod) {
      JMethod method = csMethod.getMethod();
      Context context = csMethod.getContext();
      IR ir = method.getIR();
      for (Stmt i : ir.getStmts()) {
          if (i instanceof StoreField) {
              StoreField st = (StoreField) i;
              if (st.getLValue() instanceof InstanceFieldAccess) {
                  InstanceFieldAccess lv = (InstanceFieldAccess) st.getLValue();
                  if (lv.getBase().toString() == "%this" && lv.getFieldRef().getDeclaringClass() == method.getDeclaringClass()
                          && !st.getRValue().isConst()) {
                      CSVar from = solver.getCSManager().getCSVar(context, st.getRValue());
                      CSVar to = solver.getCSManager().getCSVar(context, lv.getBase());
                      solver.addPFGEdge(from, to, FlowKind.LOCAL_ASSIGN);
                  }
              }
          }
      }
  }
Click here to see minimum reproduce code
package org.example;

public class Main {
  static void sk(User u){
      System.out.println(u.getName());
  }
  static void process(String src){
      User s=new User();
      s.setName(src);
      sk(s);
  }
  public static void main(String[] args) {
      process("xxx");
  }
}

package org.example;

public class User {
  public String getName() {
      return name;
  }

  public void setName(String name) {
      this.name = name;
  }

  private String name;

}

Hello,

Thank you for providing such a detailed description and information about the issue. This helps reduce the number of interactions needed, which is greatly appreciated by open source maintainers.

Before addressing your question, I have a side note. What prompted you to modify the placeholder for Tai-e Log in our New Issue Template? We intend it to provide runtime information originally, such as Tai-e Commit: d610a880a2c05968c9e60400f2041f281dee809f and java.runtime.version: 17.0.6+10, among other details. This is not a complaint, just a user study. 😆


[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]

My Intuition: In this points-to set, a %this variable of type org.example.User points to a String, which doesn't follow the type system. You should mock another TaintObj with the same type as %this.

I want to write a plugin that can taint any class when any field of the class is tainted

You mentioned "Any", so it is temporarily not achievable directly in the current Tai-e. Because Writing taint configuration programmatically is our future plan. It's being incubated.

But if you are in urgent need, I provide a simple idea below for your customized implementation: monitor all changes in the points-to set of all InstanceFields. If a TaintObj appears, mock a TaintObj which pointed to by the InstanceField's Instance's var.

What prompted you to modify the placeholder for Tai-e Log in our New Issue Template?

I found my tai-e.log is always empty :(. In case some information are needed:

commit: 47bdb8b2361083151a44ba76ee2f9f2dbd363b40
java: Java(TM) SE Runtime Environment (build 17.0.11+7-LTS-207)

You should mock another TaintObj with the same type as %this.

Thanks for your help. I wrote another transfer to do this:

public class ThisTransfer implements Transfer {
    HeapModel heap;

    public ThisTransfer(HeapModel heap) {
        this.heap = heap;
    }

    @Override
    public PointsToSet apply(PointerFlowEdge edge, PointsToSet input) {
        if (edge.target() instanceof CSVar && ((CSVar) edge.target()).getVar().toString() == "%this") {
            List<CSObj> append = new ArrayList<>();
            input.forEach(o -> {
                if (o.getObject() instanceof MockObj mo && mo.toString().startsWith("TaintObj")) {
                    if (mo.getType() != edge.target().getType()) {
                        append.add(new CSObj(heap.getMockObj(mo.getDescriptor(), mo.getAllocation(),
                                edge.target().getType(), mo.getContainerMethod().orElse(null), mo.isFunctional()), o.getContext(), o.getIndex()));
                    }
                }
            });
            append.forEach(input::addObject);
        }
        return input;
    }
}

and invoke in the plugin

solver.addPFGEdge(new PointerFlowEdge(FlowKind.LOCAL_ASSIGN, from, to), new ThisTransfer(solver.getHeapModel()));

[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]
[]:<org.example.User: void setName(java.lang.String)>/name -> [[]:MergedObj{}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]

From the log I can ensure the callee %this is tainted with correct type, however, the call site is

[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]

There is still no tainted object here. How can I notify or add the taint object to this? I think this should be handled by Tai-e automatically but something must be wrong.

I found my tai-e.log is always empty :(.

Is the whole thing empty? If so, there may be some potential errors.

commit: 47bdb8b

So this is not the latest code, it will not print runtime information (introduced by e87bce9). It makes sense.


  static void process(java.lang.String r1) {
      org.example.User $r0;
      [0@L8] $r0 = new org.example.User;
      [1@L8] invokespecial $r0.<org.example.User: void <init>()>();
      [2@L9] invokevirtual $r0.<org.example.User: void setName(java.lang.String)>(r1);
      [3@L10] invokestatic <org.example.Main: void sk(org.example.User)>($r0);
      [4@L11] return;
  }

  public void setName(java.lang.String name) {
      [0@L9] %this.<org.example.User: java.lang.String name> = name;
      [1@L10] return;
  }

[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]

From the log I can ensure the callee %this is tainted with correct type, however, the call site is

[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]

There is still no tainted object here. How can I notify or add the taint object to this? I think this should be handled by Tai-e automatically but something must be wrong.

What you do is User.setName/%this <- TaintObj. Everything is correct, and Tai-e has done what it should do.

<org.example.Main: void process(java.lang.String)>/$r0 -> [NewObj]
<org.example.User: void setName(java.lang.String)>/%this -> [NewObj, TaintObj]

$r0 and %this are two variables in different methods' IR. The $r0 will propagate to %this; it is because [2@L9] invokevirtual $r0.<org.example.User: void setName(java.lang.String)>(r1); create a PFG Edge from Main.process/$r0 to User.setName/%this. However, %this will not backpropagate to $r0.


A simple idea might be more like the one I suggested. I'm not sure if your implementation will fully meet this requirement; it could potentially introduce additional issues.

Solved, similar to what you suggested. I manually add taint obj to all invoke sites when ThisTransfer apply, not only add to %this.
Others can refer to these basic code. (Classes like CSObj need to make public manually, or use reflection if possible). It can also be made publicly in main branch anyway.
I do not guarantee the completeness but should work for most cases.

SuperTaintHandler.java
public class SuperTaintHandler implements Plugin {
    private Solver solver;

    @Override
    public void setSolver(Solver solver) {
        this.solver = solver;
    }


    @Override
    public void onNewCSMethod(CSMethod csMethod) {
        JMethod method = csMethod.getMethod();
        Context context = csMethod.getContext();
        IR ir = method.getIR();
        for (Stmt i : ir.getStmts()) {
            if (i instanceof StoreField st) {
                if (st.getLValue() instanceof InstanceFieldAccess lv) {
                    if (lv.getBase().toString() == "%this" && !st.getRValue().isConst()) {
                        CSVar from = solver.getCSManager().getCSVar(context, st.getRValue());
                        CSVar to = solver.getCSManager().getCSVar(context, lv.getBase());
                        Set<Var> varList = new HashSet<>();
                        for (Edge<CSCallSite, CSMethod> e : csMethod.getEdges()) {
                            InvokeInstanceExp exp = (InvokeInstanceExp) e.getCallSite().getCallSite().getInvokeExp();
                            varList.add(exp.getBase());
                        }
                        solver.addPFGEdge(new PointerFlowEdge(FlowKind.LOCAL_ASSIGN, from, to), new ThisTransfer(solver, varList));
                    }
                }
            }
        }
    }
}
ThisTransfer.java
public class ThisTransfer implements Transfer {
    Solver solver;

    Set<Var> varList;

    public ThisTransfer(Solver solver, Set<Var> varList) {
        this.solver = solver;
        this.varList = varList;
    }


    @Override
    public PointsToSet apply(PointerFlowEdge edge, PointsToSet input) {
        if (edge.target() instanceof CSVar && ((CSVar) edge.target()).getVar().toString() == "%this") {
            List<CSObj> append = new ArrayList<>();
            input.forEach(o -> {
                if (o.getObject() instanceof MockObj mo && mo.toString().startsWith("TaintObj")) {
                    if (mo.getType() != edge.target().getType()) {
                        CSObj taint = new CSObj(solver.getHeapModel().getMockObj(mo.getDescriptor(), mo.getAllocation(),
                                edge.target().getType(), mo.getContainerMethod().orElse(null), mo.isFunctional()), o.getContext(), o.getIndex());
                        varList.forEach(var -> {
                            CSVar csvar = solver.getCSManager().getCSVar(o.getContext(), var);
                            PointsToSet set = csvar.getPointsToSet();
                            set.addObject(taint);
                            csvar.setPointsToSet(set);
                        });
                        append.add(taint);
                    }
                }
            });
            append.forEach(input::addObject);
        }
        return input;
    }
}

For the example code above, it should generate

[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]
[]:<org.example.User: void setName(java.lang.String)>/name -> [[]:MergedObj{}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]

Is the whole thing empty?

Yes, I have also updated to the latest version. The log file is completely empty.

git log
commit b848c52 (HEAD -> master, origin/master, origin/HEAD)

Console output
D:\taie\build>java -jar tai-e-all-0.5.1-SNAPSHOT.jar --options-file=options.yml
Tai-e starts ...
Output directory: D:\taie\build\output
Writing options to D:\taie\build\output\options.yml
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Writing log to D:\taie\build\output\tai-e.log
java.version: 17.0.11
java.version.date: 2024-04-16
java.runtime.version: 17.0.11+7-LTS-207
java.vendor: Oracle Corporation
java.vendor.version: null
os.name: Windows 10
os.version: 10.0
os.arch: amd64
Tai-e Version: 0.5.1-SNAPSHOT
Tai-e Commit: d610a880a2c05968c9e60400f2041f281dee809f
.....

Anyway, appreciate for your immediate help and develop such a useful tool! Cheers.

Yes, I have also updated to the latest version. The log file is completely empty.

Fixed in cfd0fb7.