pascal-lab/Tai-e

Ignoring array load of primitive type causes taint flow interruption

Closed this issue · 8 comments

I notice that: for point analysis, it will ignore array load or store stmt if array consists of primitive type elements.
follow is code in DefaultSolver

public class DefaultSolver implements Solver {
    private static boolean isConcerned(Exp exp) { 
        Type type = exp.getType();
        return type instanceof ReferenceType && !(type instanceof NullType);
    }

    private void processArrayLoad(CSVar arrayVar, PointsToSet pts) {
        Context context = arrayVar.getContext();
        Var var = arrayVar.getVar();
        for (LoadArray load : var.getLoadArrays()) {  
            Var lvalue = load.getLValue();  
            if (isConcerned(lvalue)) {
                ...
            }
        }
    }
    ...
}

it work well for point analysis, because primitive isn't a pointer, can't pass object reference.
But In Taint analysis, I think ignoring primitive type array isn't good choice.
follow is an example

char[] buffer;
String str = sources();    // taint object, type=String
buffer = str.toCharArray();    // ignored, even if add taint transfer rules
sink(buffer)    

Essentially, Taint analysis focus on variable value, and value can be passed through primitive type variable too. therefore Ignoring primitive type causes taint flow interruption.
Maybe we should take primitive type into account for taint analysis?

char[] is not primitive type (but char is), so Tai-e donot ignore char[] in pointer analysis.

In your case, it seems that the issue is with the transfer rule (or its process). What are your taint rules?

here is my rules:

transfers:
    - { method: "<java.lang.String: void getChars(int,int,char[],int)>", from: base, to: 2, type: "char[]" }

follow is the program fragment being analyzed, parameter str points to tainted object, I will it can spread to buffer, but i failed

public class TextStringBuilder ... {

    private char[] buffer;

    public TextStringBuilder append(final String str, final int startIndex, final int length) {
        ...
        str.getChars(startIndex, startIndex + length, buffer, len);    //Here
        size += length;
        return this;
    }
    ...
}

IR for this method:

    public org.apache.commons.text.TextStringBuilder append(java.lang.String str, int startIndex, int length) {
        ...
        [24@L642] $r4 = %this.<org.apache.commons.text.TextStringBuilder: char[] buffer>;
        [25@L642] invokevirtual str.<java.lang.String: void getChars(int,int,char[],int)>(startIndex, $i7, $r4, $i5);
       
        [26@L643] $i8 = %this.<org.apache.commons.text.TextStringBuilder: int size>;
        [27@L643] $i9 = $i8 + length;
        [28@L643] %this.<org.apache.commons.text.TextStringBuilder: int size> = $i9;
        [29@L645] return %this;
    }

notiablly:
[24@L642]: tai adding a PFG edge: this_obj.buffer->$r4
[25@L642] : taint transfer add new taint object into pts($r4), but this new taint object can't propagate to this_obj.buffer because there is no such edge in PFG

IR for this method:

    public org.apache.commons.text.TextStringBuilder append(java.lang.String str, int startIndex, int length) {
        ...
        [24@L642] $r4 = %this.<org.apache.commons.text.TextStringBuilder: char[] buffer>;
        [25@L642] invokevirtual str.<java.lang.String: void getChars(int,int,char[],int)>(startIndex, $i7, $r4, $i5);
       
        [26@L643] $i8 = %this.<org.apache.commons.text.TextStringBuilder: int size>;
        [27@L643] $i9 = $i8 + length;
        [28@L643] %this.<org.apache.commons.text.TextStringBuilder: int size> = $i9;
        [29@L645] return %this;
    }

notiablly: [24@L642]: tai adding a PFG edge: this_obj.buffer->$r4 [25@L642] : taint transfer add new taint object into pts($r4), but this new taint object can't propagate to this_obj.buffer because there is no such edge in PFG

@chennbnbnb Thank you for your nice example, which reveals the limitation of the taint analysis. But this has nothing to do with primitive type, and it is about the handling of alias and mutable objects. In your example, the call String.getChars() mutates the char[] object pointed to by $r4. Ideally, as the object is also pointed to by this.buffer, this.buffer should also be tainted, but the taint analysis currently does not consider such case, and we are working on it.

BTW, how did you make up the example? Does it come from real code?

@chennbnbnb We have extended the taint analysis in this branch to handle your case, and you could try and see if it fixes your problem.

Currently we just handle specific case (loaded from instance field) as a too general handling of alias and mutable objects may cause precision and efficiency issues. If you encounter some other cases that causes taint flow interruption, please let us know. Thanks.

The change has been merged into master. Close as no response.

@chennbnbnb We have extended the taint analysis in this branch to handle your case, and you could try and see if it fixes your problem.

Currently we just handle specific case (loaded from instance field) as a too general handling of alias and mutable objects may cause precision and efficiency issues. If you encounter some other cases that causes taint flow interruption, please let us know. Thanks.

Here is a real code , decoder.decode will cause taint break, if replace decoder.decode(clsbytecodeb64) with clsbytecodeb64.getBytes() it will work.

This is TaintConfig

sources:
  - { method: "<javax.servlet.ServletRequest: java.lang.String getParameter(java.lang.String)>", type: "java.lang.String" }
  - { method: "<javax.servlet.ServletRequestWrapper: java.lang.String getParameter(java.lang.String)>", type: "java.lang.String"}

sinks:
  - { method: "<java.lang.Runtime: java.lang.Process exec(java.lang.String)>", index: 0 }
  - { method: "<java.lang.ClassLoader: java.lang.Class defineClass(byte[],int,int)>", index: 0}

transfers:
  - { method: "<java.util.Base64$Decoder: byte[] decode(byte[])>", from: 0, to: result, type: "byte[]" }
  - { method: "<java.util.Base64$Decoder: int decode(byte[],byte[])>", from: 0, to: 1, type: "byte[]"}
  - { method: "<java.util.Base64$Decoder: byte[] decode(java.lang.String)>", from: 0, to: result, type: "byte[]" }
  - { method: "<java.lang.String: byte[] getBytes()>", from: base, to: result, type: "byte[]" }

@struce2 there are some tricks during jsp compiled process by jasper, the decoder var is not local in the _jspService method, instead of one filed in the mock servlet class. so we also mock one entry method init, just as it was created by a New statement.

JClass helloServlet = World.get().getClassHierarchy().getClass("test2__002e__jsp");
        JMethod service = helloServlet.getDeclaredMethod("_jspService");
        JMethod init = helloServlet.getDeclaredMethod("<init>");
        JClass requestWrapper = World.get().getClassHierarchy().getClass("javax.servlet.http.HttpServletRequestWrapper");
        JClass responseWrapper = World.get().getClassHierarchy().getClass("javax.servlet.http.HttpServletResponseWrapper");

   
        HeapModel heapModel = solver.getHeapModel();
        Obj request = heapModel.getMockObj("EntryPointObj", "<http-request-wrapper>", requestWrapper.getType(), service);
        Obj response = heapModel.getMockObj("EntryPointObj", "<http-response-wrapper>", responseWrapper.getType(), service);
        Obj servlet = heapModel.getMockObj("EntryPointObj", "<hello-servlet>", helloServlet.getType());

        SpecifiedParamProvider paramProvider = new SpecifiedParamProvider.Builder(service)
                .addThisObj(servlet)
                .addParamObj(0, request)
                .addParamObj(1, response)
                .build();
        solver.addEntryPoint(new EntryPoint(service, paramProvider));

        SpecifiedParamProvider paramProvider1 = new SpecifiedParamProvider.Builder(init)
                .addThisObj(servlet)
                .build();
        solver.addEntryPoint(new EntryPoint(init, paramProvider1));