pascal-lab/Tai-e

Pointer Set Propagation in Lambda Expression

HKJL10201 opened this issue · 4 comments

Describe the bug

Hi all! I encountered a problem when I analyzed a program containing lambda expressions. In the following sections, I will introduce my process of reproducing the problem in the experimental environment, my expected results and the description of the problem.

Experiment Setup

Modified source code in file src/test/resources/pta/taint/SimpleTaint.java:

import java.util.ArrayList;
import java.util.List;

class SimpleTaint {

    public static void main(String[] args) {
        List<SourceSink> sourceSinkList = new ArrayList<>();
        SourceSink s1 = new SourceSink();
        SourceSink s2 = new SourceSink();
        sourceSinkList.add(s1);
        sourceSinkList.add(s2);
        List<String> stringList = convert(sourceSinkList);
        sink(stringList);
    }

    public static List<String> convert(List<SourceSink> p) {
        List<String> res = new ArrayList<>();
        if (p.isEmpty()) {
            return res;
        }
        p.stream().forEach( s -> {
            String tmp = s.tainted2;
            res.add(tmp);
        });
        return res;
    }

    public static void sink(List list) {}

}

Added taint rules in file src/test/resources/pta/taint/taint-config.yml:

sources:
  - { kind: field, field: "<SourceSink: java.lang.String tainted2>" }

sinks:
  - { method: "<SimpleTaint: void sink(java.lang.List)>", index: 0 }

transfers:
  - { method: "<java.util.ArrayList: boolean add(java.lang.Object)>", from: 0, to: base }

Modified the function isIgnored in file src/main/java/pascal/taie/analysis/pta/core/solver/DefaultSolver.java:

    private boolean isIgnored(JMethod method) {
        if (method.toString().contains("java.util")
                || method.toString().contains("java.lang.invoke.LambdaMetafactory")
        ) {
            return false;
        }
        return ignoredMethods.contains(method) ||
                onlyApp && !method.isApplication();
    }

Tai-e arguments:

-pp -ap -cp src/test/resources/pta/taint -m SimpleTaint -java 17 -a "pta=cs:2-obj;dump:true;only-app:true;taint-config:src/test/resources/pta/taint/taint-config.yml;handle-invokedynamic:true;implicit-entries:false"

Expected Results

In function SimpleTaint:convert(), the taint source is s.tainted2, which should be propagated to the list res via the "transfer" rule in taint-config, then returned to the list stringList in main() and finally passed to sink().

Problem Description

In the jimple file, the lambda expression is converted to a function named void lambda$convert$0(java.util.List,SourceSink), where the first parameter should be the target list, i.e. res, and the second parameter is the local variable s. The related parts of the jimple file are showed as follows:

Parts of file SimpleTaint.jimple:

public static java.util.List convert(java.util.List)
{
	java.util.function.Consumer $r3;
	java.util.ArrayList $r0;

	$r0 = new java.util.ArrayList;

	$r3 = dynamicinvoke "accept" <java.util.function.Consumer (java.util.List)>($r0) <java.lang.invoke.LambdaMetafactory: java.lang.invoke.CallSite metafactory(java.lang.invoke.MethodHandles$Lookup,java.lang.String,java.lang.invoke.MethodType,java.lang.invoke.MethodType,java.lang.invoke.MethodHandle,java.lang.invoke.MethodType)>(methodtype: void __METHODTYPE__(java.lang.Object), methodhandle: "REF_INVOKE_STATIC" <SimpleTaint: void lambda$convert$0(java.util.List,SourceSink)>, methodtype: void __METHODTYPE__(SourceSink));

	return $r0;
}

private static void lambda$convert$0(java.util.List, SourceSink)
{
	java.util.List r2;
	java.lang.String r1;
	SourceSink r0;

	r2 := @parameter0: java.util.List;

	r0 := @parameter1: SourceSink;

	r1 = r0.<SourceSink: java.lang.String tainted2>;

	interfaceinvoke r2.<java.util.List: boolean add(java.lang.Object)>(r1);

	return;
}

According to the jimple code and the pta-results, r1 and r2 in lambda$convert$0() are tainted, which is consistent with our expectations. But $r0 in convert() is not tainted, which is the result returned to main().

Parts of the file pta-results.txt:

[]:<SimpleTaint: void lambda$convert$0(java.util.List,SourceSink)>/r2 -> [[]:NewObj{<SimpleTaint: java.util.List convert(java.util.List)>[0@L49] new java.util.ArrayList}, []:TaintObj{alloc=<SimpleTaint: void lambda$convert$0(java.util.List,SourceSink)> [0@L55] r1 = r0.<SourceSink: java.lang.String tainted2>,type=java.util.ArrayList}]

[]:<SimpleTaint: java.util.List convert(java.util.List)>/$r0 -> [[]:NewObj{<SimpleTaint: java.util.List convert(java.util.List)>[0@L49] new java.util.ArrayList}]

In my understanding, $r0 in convert() should be the first parameter of lambda$convert$0(), that is, $r0 in convert() and r2 in lambda$convert$0() should have the same address and same pointer set, which means that the TaintObj in the pointer set of lambda$convert$0()/r2 should be propagated to that of convert()/$r0.

Is my understanding correct? If so, how can I propagate the pointer set from lambda$convert$0()/r2 to convert()/$r0? Thanks!

Tai-e arguments

-pp -ap -cp src/test/resources/pta/taint -m SimpleTaint -java 17 -a "pta=cs:2-obj;dump:true;only-app:true;taint-config:src/test/resources/pta/taint/taint-config.yml;handle-invokedynamic:true;implicit-entries:false"

Runtime environment infomation

No response

The problem does not due to the lambda syntax, but in the method call, when the arguments is passed to the parameter, and the transfer function (param -> base) is used to spread the taint of the var of the parameter, but this propagation can only affect the parameter but not the arguments, so there will be a problem that the taint will be lost.

In the code implementation of tai-e, because in the propagate function, the pointer propagation will only be propagated to the successor, not to the previous node, so parameter cannot be affected by the arguments.

That make sense, thanks!

But I still want to know if there is a solution to achieve my expected result?

@HKJL10201 Thank you for your nice question and example :-)

This question essentially relates to the practice of taint analysis. Tai-e's taint analysis associates tainted objects with pointers (variables and fields), which is more like tagging pointers. The advantage of this approach is that you can get better precision in flow-insensitive setting, and obtain a clear taint-flow path; the disadvantage is the imperfect handling of aliases and mutable objects, which your example just exposes. In fact, other user also has encountered a similar problem before (see issue #22 and my comment).

To solve this aliasing-related problem completely, one approach would be to tagging objects. This can be done by maintaining a global map which associates objects to relevant taints. For instance, for the ArrayList object in your example, we could map it to the taint SourceSink.tainted2 after analyzing r2.add(r1) in lambda$convert$0(). Then when analyzing sink(stringList), we first obtain the list objects pointed to by stringList, and then check whether these objects are associated to taints. In this way, we can discover that there is taint flows to the sink(stringList).

Thank you for your advice and pointing out that this is an aliasing-related problem, which makes a lot of sense! I will try it out.