How to correctly taint "%this"?
MXWXZ opened this issue · 5 comments
Overall Description
Hi, I want to write a plugin that can taint any class when any field of the class is tainted (e.g, when a tainted variable is passed to obj.setName(String)
, I want the obj
is tainted as well after that). However, the caller variable is not affected even through the "%this" variable in the callee function is tainted. So how to correctly handle this?
You can refer to the minimum reproduce code below that the "%this" in callee function is tainted
[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
However the caller "%this" is untouched
[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
I may misunderstand some parts of the pta analysis, do I need to manually propagate this (why not be handled automatically)?
Expected Behavior
The caller object is also tainted.
Current Behavior
The caller object is not tainted.
Tai-e Arguments
Click here to see Tai-e Options
optionsFile: null
printHelp: false
classPath: []
appClassPath:
- tester-1.0-SNAPSHOT.jar
mainClass: org.example.Main
inputClasses: []
javaVersion: 8
prependJVM: false
allowPhantom: true
worldBuilderClass: pascal.taie.frontend.soot.SootWorldBuilder
outputDir: output
preBuildIR: false
worldCacheMode: false
scope: ALL
nativeModel: true
planFile: null
analyses:
ir-dumper: ;
pta: "taint-config:config.yml;plugins:[pascal.taie.analysis.pta.plugin.taint.SuperTaintHandler];dump:true"
onlyGenPlan: false
keepResult:
- $KEEP-ALL
Tai-e Log
Click here to see IR log
static void process(java.lang.String r1) {
org.example.User $r0;
[0@L8] $r0 = new org.example.User;
[1@L8] invokespecial $r0.<org.example.User: void <init>()>();
[2@L9] invokevirtual $r0.<org.example.User: void setName(java.lang.String)>(r1);
[3@L10] invokestatic <org.example.Main: void sk(org.example.User)>($r0);
[4@L11] return;
}
public void setName(java.lang.String name) {
[0@L9] %this.<org.example.User: java.lang.String name> = name;
[1@L10] return;
}
Click here to see points-to results
[]:<org.example.Main: void main(java.lang.String[])>/%stringconst0 -> [[]:MergedObj{<Merged string constants>}]
[]:<org.example.Main: void main(java.lang.String[])>/r0 -> [[]:EntryPointObj{alloc=MethodParam{<org.example.Main: void main(java.lang.String[])>/0},type=java.lang.String[] in <org.example.Main: void main(java.lang.String[])>}]
[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.Main: void process(java.lang.String)>/r1 -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.Main: void sk(org.example.User)>/$r1 -> [[]:NewObj{<java.lang.System: java.io.PrintStream newPrintStream(java.io.FileOutputStream,java.lang.String)>[1@L1147] new java.io.PrintStream}, []:NewObj{<java.lang.System: java.io.PrintStream newPrintStream(java.io.FileOutputStream,java.lang.String)>[9@L1150] new java.io.PrintStream}]
[]:<org.example.Main: void sk(org.example.User)>/$r2 -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.Main: void sk(org.example.User)>/r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.User: java.lang.String getName()>/$r1 -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.User: java.lang.String getName()>/%this -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.User: void <init>()>/%this -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{<Merged string constants>}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.User: void setName(java.lang.String)>/name -> [[]:MergedObj{<Merged string constants>}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
Additional Information
Click here to see key code of my plugin
// It may not be efficient, any suggestions to improve?
@Override
public void onNewCSMethod(CSMethod csMethod) {
JMethod method = csMethod.getMethod();
Context context = csMethod.getContext();
IR ir = method.getIR();
for (Stmt i : ir.getStmts()) {
if (i instanceof StoreField) {
StoreField st = (StoreField) i;
if (st.getLValue() instanceof InstanceFieldAccess) {
InstanceFieldAccess lv = (InstanceFieldAccess) st.getLValue();
if (lv.getBase().toString() == "%this" && lv.getFieldRef().getDeclaringClass() == method.getDeclaringClass()
&& !st.getRValue().isConst()) {
CSVar from = solver.getCSManager().getCSVar(context, st.getRValue());
CSVar to = solver.getCSManager().getCSVar(context, lv.getBase());
solver.addPFGEdge(from, to, FlowKind.LOCAL_ASSIGN);
}
}
}
}
}
Click here to see minimum reproduce code
package org.example;
public class Main {
static void sk(User u){
System.out.println(u.getName());
}
static void process(String src){
User s=new User();
s.setName(src);
sk(s);
}
public static void main(String[] args) {
process("xxx");
}
}
package org.example;
public class User {
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
private String name;
}
Hello,
Thank you for providing such a detailed description and information about the issue. This helps reduce the number of interactions needed, which is greatly appreciated by open source maintainers.
Before addressing your question, I have a side note. What prompted you to modify the placeholder for Tai-e Log
in our New Issue Template? We intend it to provide runtime information originally, such as Tai-e Commit: d610a880a2c05968c9e60400f2041f281dee809f
and java.runtime.version: 17.0.6+10
, among other details. This is not a complaint, just a user study. 😆
[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
My Intuition: In this points-to set, a %this variable of type org.example.User
points to a String
, which doesn't follow the type system. You should mock another TaintObj
with the same type as %this.
I want to write a plugin that can taint any class when any field of the class is tainted
You mentioned "Any", so it is temporarily not achievable directly in the current Tai-e. Because Writing taint configuration programmatically is our future plan. It's being incubated.
But if you are in urgent need, I provide a simple idea below for your customized implementation: monitor all changes in the points-to set of all InstanceFields. If a TaintObj
appears, mock a TaintObj which pointed to by the InstanceField's Instance's var.
What prompted you to modify the placeholder for Tai-e Log in our New Issue Template?
I found my tai-e.log
is always empty :(. In case some information are needed:
commit: 47bdb8b2361083151a44ba76ee2f9f2dbd363b40
java: Java(TM) SE Runtime Environment (build 17.0.11+7-LTS-207)
You should mock another TaintObj with the same type as %this.
Thanks for your help. I wrote another transfer to do this:
public class ThisTransfer implements Transfer {
HeapModel heap;
public ThisTransfer(HeapModel heap) {
this.heap = heap;
}
@Override
public PointsToSet apply(PointerFlowEdge edge, PointsToSet input) {
if (edge.target() instanceof CSVar && ((CSVar) edge.target()).getVar().toString() == "%this") {
List<CSObj> append = new ArrayList<>();
input.forEach(o -> {
if (o.getObject() instanceof MockObj mo && mo.toString().startsWith("TaintObj")) {
if (mo.getType() != edge.target().getType()) {
append.add(new CSObj(heap.getMockObj(mo.getDescriptor(), mo.getAllocation(),
edge.target().getType(), mo.getContainerMethod().orElse(null), mo.isFunctional()), o.getContext(), o.getIndex()));
}
}
});
append.forEach(input::addObject);
}
return input;
}
}
and invoke in the plugin
solver.addPFGEdge(new PointerFlowEdge(FlowKind.LOCAL_ASSIGN, from, to), new ThisTransfer(solver.getHeapModel()));
[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]
[]:<org.example.User: void setName(java.lang.String)>/name -> [[]:MergedObj{}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
From the log I can ensure the callee %this is tainted with correct type, however, the call site is
[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
There is still no tainted object here. How can I notify or add the taint object to this? I think this should be handled by Tai-e automatically but something must be wrong.
I found my tai-e.log is always empty :(.
Is the whole thing empty? If so, there may be some potential errors.
commit: 47bdb8b
So this is not the latest code, it will not print runtime information (introduced by e87bce9). It makes sense.
static void process(java.lang.String r1) { org.example.User $r0; [0@L8] $r0 = new org.example.User; [1@L8] invokespecial $r0.<org.example.User: void <init>()>(); [2@L9] invokevirtual $r0.<org.example.User: void setName(java.lang.String)>(r1); [3@L10] invokestatic <org.example.Main: void sk(org.example.User)>($r0); [4@L11] return; } public void setName(java.lang.String name) { [0@L9] %this.<org.example.User: java.lang.String name> = name; [1@L10] return; }
[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]
From the log I can ensure the callee %this is tainted with correct type, however, the call site is
[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}]
There is still no tainted object here. How can I notify or add the taint object to this? I think this should be handled by Tai-e automatically but something must be wrong.
What you do is User.setName/%this <- TaintObj
. Everything is correct, and Tai-e has done what it should do.
<org.example.Main: void process(java.lang.String)>/$r0 -> [NewObj]
<org.example.User: void setName(java.lang.String)>/%this -> [NewObj, TaintObj]
$r0
and %this
are two variables in different methods' IR. The $r0
will propagate to %this
; it is because [2@L9] invokevirtual $r0.<org.example.User: void setName(java.lang.String)>(r1);
create a PFG Edge from Main.process/$r0
to User.setName/%this
. However, %this
will not backpropagate to $r0
.
A simple idea might be more like the one I suggested. I'm not sure if your implementation will fully meet this requirement; it could potentially introduce additional issues.
Solved, similar to what you suggested. I manually add taint obj to all invoke sites when ThisTransfer
apply, not only add to %this.
Others can refer to these basic code. (Classes like CSObj
need to make public manually, or use reflection if possible). It can also be made publicly in main branch anyway.
I do not guarantee the completeness but should work for most cases.
SuperTaintHandler.java
public class SuperTaintHandler implements Plugin {
private Solver solver;
@Override
public void setSolver(Solver solver) {
this.solver = solver;
}
@Override
public void onNewCSMethod(CSMethod csMethod) {
JMethod method = csMethod.getMethod();
Context context = csMethod.getContext();
IR ir = method.getIR();
for (Stmt i : ir.getStmts()) {
if (i instanceof StoreField st) {
if (st.getLValue() instanceof InstanceFieldAccess lv) {
if (lv.getBase().toString() == "%this" && !st.getRValue().isConst()) {
CSVar from = solver.getCSManager().getCSVar(context, st.getRValue());
CSVar to = solver.getCSManager().getCSVar(context, lv.getBase());
Set<Var> varList = new HashSet<>();
for (Edge<CSCallSite, CSMethod> e : csMethod.getEdges()) {
InvokeInstanceExp exp = (InvokeInstanceExp) e.getCallSite().getCallSite().getInvokeExp();
varList.add(exp.getBase());
}
solver.addPFGEdge(new PointerFlowEdge(FlowKind.LOCAL_ASSIGN, from, to), new ThisTransfer(solver, varList));
}
}
}
}
}
}
ThisTransfer.java
public class ThisTransfer implements Transfer {
Solver solver;
Set<Var> varList;
public ThisTransfer(Solver solver, Set<Var> varList) {
this.solver = solver;
this.varList = varList;
}
@Override
public PointsToSet apply(PointerFlowEdge edge, PointsToSet input) {
if (edge.target() instanceof CSVar && ((CSVar) edge.target()).getVar().toString() == "%this") {
List<CSObj> append = new ArrayList<>();
input.forEach(o -> {
if (o.getObject() instanceof MockObj mo && mo.toString().startsWith("TaintObj")) {
if (mo.getType() != edge.target().getType()) {
CSObj taint = new CSObj(solver.getHeapModel().getMockObj(mo.getDescriptor(), mo.getAllocation(),
edge.target().getType(), mo.getContainerMethod().orElse(null), mo.isFunctional()), o.getContext(), o.getIndex());
varList.forEach(var -> {
CSVar csvar = solver.getCSManager().getCSVar(o.getContext(), var);
PointsToSet set = csvar.getPointsToSet();
set.addObject(taint);
csvar.setPointsToSet(set);
});
append.add(taint);
}
}
});
append.forEach(input::addObject);
}
return input;
}
}
For the example code above, it should generate
[]:<org.example.User: void setName(java.lang.String)>/%this -> [[]:MergedObj{}, []:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]
[]:<org.example.User: void setName(java.lang.String)>/name -> [[]:MergedObj{}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=java.lang.String}]
[]:<org.example.Main: void process(java.lang.String)>/$r0 -> [[]:NewObj{<org.example.Main: void process(java.lang.String)>[0@L8] new org.example.User}, []:TaintObj{alloc=<org.example.Main: void process(java.lang.String)>/0,type=org.example.User}]
Is the whole thing empty?
Yes, I have also updated to the latest version. The log file is completely empty.
git log
commit b848c52 (HEAD -> master, origin/master, origin/HEAD)
Console output
D:\taie\build>java -jar tai-e-all-0.5.1-SNAPSHOT.jar --options-file=options.yml
Tai-e starts ...
Output directory: D:\taie\build\output
Writing options to D:\taie\build\output\options.yml
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
Writing log to D:\taie\build\output\tai-e.log
java.version: 17.0.11
java.version.date: 2024-04-16
java.runtime.version: 17.0.11+7-LTS-207
java.vendor: Oracle Corporation
java.vendor.version: null
os.name: Windows 10
os.version: 10.0
os.arch: amd64
Tai-e Version: 0.5.1-SNAPSHOT
Tai-e Commit: d610a880a2c05968c9e60400f2041f281dee809f
.....
Anyway, appreciate for your immediate help and develop such a useful tool! Cheers.
Yes, I have also updated to the latest version. The log file is completely empty.
Fixed in cfd0fb7.