pascal-lab/Tai-e

How to config to result[*] rule for return array type in taint analysis

Raul1718 opened this issue · 9 comments

Description

Hi,

When I test some cases that return type is array and as transfer, such as String.split. I doubt how to correct config the rule.

My test sample:

class ArgToResultStringSplit {
    public static void main(String[] args) {
        String taint = SourceSink.source();
        String[] taints = taint.split(",");
        String s2 = taints[1]; // no taint now!
        SourceSink.sink(s2); // taint
    }
} 

The transfer rule configured below.

- { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: result, type: "java.lang.String[]" }
could transfer to "String[] taints", but var s2 is not tainted after get taints[1].

or

- { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: "result[*]", type: "java.lang.String[]" }

I also tested, but could not transfer to "String[] taints".

Could you provide guidance on how to configure correctly to detect this ArgToResultStringSplit case.
Thanks!

TL;DR:
Use - { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: "result[*]", type: "java.lang.String[]" }, and set pointer analysis option to only-app:false;.


For your given code snippet, with a static analysis perspective, the expectation is as follows:

flowchart LR
    A[taints]
    B["NewObj ... newarray java.lang.String[...]"]
    C["NewObj ... newarray java.lang.String[...][*]"]
    D["TaintObj"]

    A --> |points-to| B
    C --> |points-to| D
Loading

Intuitively, configuring the pointer analysis option with only-app:true; results in the method split not being processed. As a result, the arrayObj NewObj...newarray java.lang.String[...] will not be pointed to by the taints variable.

TL;DR: Use - { method: "<java.lang.String: java.lang.String[] split(java.lang.String)>", from: base, to: "result[*]", type: "java.lang.String[]" }, and set pointer analysis option to only-app:false;.

For your given code snippet, with a static analysis perspective, the expectation is as follows:

flowchart LR
    A[taints]
    B["NewObj ... newarray java.lang.String[...]"]
    C["NewObj ... newarray java.lang.String[...][*]"]
    D["TaintObj"]

    A --> |points-to| B
    C --> |points-to| D
Loading

Intuitively, configuring the pointer analysis option with only-app:true; results in the method split not being processed. As a result, the arrayObj NewObj...newarray java.lang.String[...] will not be pointed to by the taints variable.

I configured the propagation rule as you said and set only-app=false, but the taint propagation path is still missing in the final taint-flow-graph.dot taint flow graph.

Could you please share your entire Tai-e project with me, either via file upload or by providing a link to your GitHub repository? This will allow me to reproduce the issue with one-click.

Could you please share your entire Tai-e project with me, either via file upload or by providing a link to your GitHub repository? This will allow me to reproduce the issue with one-click.

我调试了下,发现返回值类型为数组时,pfg将taint变量直接指向了taints变量,但是在处理ArrayLoad的时候是从taints的数组指针指向LValue,导致传播中断了,我改为了new String[]{taint}之后,IR中会出现ArrayStore,所以会将taint指向new String[]的数组指针,后续load就会连续,所以这种情况应该怎么处理

Thanks for writing this issue in Chinese, it makes it very easy for a few of us developers to read. However, Tai-e is a project with an international focus, and we aim to keep our community discussions in as widely accessible a language as possible, which is English. This helps us share our open-source progress with interested individuals from all over the world. Kindly consider using English for communication in the future. We appreciate your understanding. Thank you.

Again, could you please share your entire Tai-e project with me, either via file upload or by providing a link to your GitHub repository? This will allow me to reproduce the issue with one-click.

Thank you ❤️ for uploading the modified files separately, but for convenience, please package the entire project so that I can execute it directly, such as https://github.com/Tai-e/Tai-e-Examples/tree/master/MinimalReproducibleExample-0069.

Besides, I have checked the repo, and found that the Tai-e's arguments is missing.

Thank you very much for your patience. When I looked at the example you gave, I found that my problem is the same as the problem in # 69 of the example, and I saw that you explained in # 69 that you need to modify the TFGBuilder.buildComplete method implementation by yourself. But I'm going to try to modify the TransferHandler to add the taint variable to the edge of the array ArrayIndexNode on the PFG. I wonder if I'm doing this right? Thanks again.

ArrayIndexNode has always existed in TFG, but by default, Tai-e only outputs nodes within the application scope.


To address this, I spent my time implementing a prototype that demonstrates the solution of both Issue 69 and this current issue.

In this commit Tai-e/Tai-e-Examples@8ec1d9a, I modified Tai-e's class implementation to output the complete TFG as well as one of the shortest paths for each taint flow.

After running this example, three key files will be generated in the output directory:

  • taint-flow-graph.dot: This file contains the original Taint Flow Graph (TFG) outputted by Tai-e, which includes only nodes within the application scope.
  • complete-taint-flow-graph.dot: Generated by the modified code. The complete TFG (Taint Flow Graph), which includes all nodes, will show the paths like x1 -> ArrayNode -> x2, but this graph is typically too large to be easily viewed by humans.
  • taint-flow-path-1.dot: Generated by the modified code, this file represents one of the shortest paths for taint flow 1. The shortest path is calculated using the BFS algorithm (serves as a prototype).
    image