/FlowMatrix

FLOWMATRIX: GPU-Assisted Information-Flow Analysis through Matrix-Based Representation, USENIX Security'22

Primary LanguageC++GNU General Public License v3.0GPL-3.0

FlowMatrix

A query-style tool for GPU-assisted offline information-flow analysis.

Cite

Kaihang Ji, Jun Zeng, Yuancheng Jiang, and Zhenkai Liang, Zheng Leong Chua, Prateek Saxena and Abhik Roychoudhury, FlowMatrix: GPU-Assisted Information-Flow Analysis through Matrix-Based Representation. In USENIX Security Symposium, 2022.

Dependencies

Installation

Simply build with make at the project root directory, use:

$ make -j

Usage

$ ./bin/QueryCLI <path_to_database>

1. Examples

You may find a database with 2 examples from Release.

To load this database:

$ ./bin/QueryCLI ./examples.db

Example: [sample1] data flows of two 'mov' instructions. Check sample1 for more details.

FMQuery> WorkOn sample1

FMQuery> Query INSTR 1 TO INSTR 2

Example: [sample2] data flows between buffer read and buffer write. Check sample2 for more details.

FMQuery> WorkOn sample2

FMQuery> Query SYSCALL read TO SYSCALL write

2. Advanced System Call Query

Query data flows among all common source syscalls to all common sink syscalls:

FMQuery> Query SYSCALL read,recv*,mmap TO SYSCALL write,send*

That might print multiple records as it performs a n-to-m query. You might want to narrow down the query range by first checking the system call trace:

FMQuery> SyscallPrintAll
...
[40]@119916  read(0)              0x22a = read(0x8, 0x7fff9479ae20, 0x400)
...
[47]@184288  write(1)             0x22a = write(0x8, 0x7fff9479ae20, 0x22a)
...

"119916" and "184288" are their instruction index respectively. Thus, you can perfrom query upon a subset of the trace:

FMQuery> QueryInRange (119916,184288) SYSCALL read,recv*,mmap TO SYSCALL write,send*

Or, just query the data flow between these two syscall instructions:

FMQuery> Query INSTR 119916 TO INSTR 184288    

3. Advanced Instruction Query

System call queries can be combained with instruction queries.

For example, query data flows from all common source syscalls to all return instructions (usually for checking whether RIP is influenced by any source syscall):

FMQuery> Query SYSCALL read,recv* TO INSTR ret

Also, you may print instruction traces for better understanding (for example, the 999th instruction in Sample2 is a RET instruction):

FMQuery> TracePrint 999
   996:       488d0490       lea      rax, qword ptr [rax + rdx*4]
   997:       898ff4020000   mov      dword ptr [rdi + 0x2f4], ecx
   998:       48898708030000 mov      qword ptr [rdi + 0x308], rax
==>999:       c3             ret
  1000:       488d0521892200 lea      rax, qword ptr [rip + 0x228921]
  1001:       48890542892200 mov      qword ptr [rip + 0x228942], rax
  1002:       488d0523dfffff lea      rax, qword ptr [rip - 0x20dd]
  1003:       4889054c8c2200 mov      qword ptr [rip + 0x228c4c], rax
  1004:       488d0585902200 lea      rax, qword ptr [rip + 0x229085]
  1005:       488905468c2200 mov      qword ptr [rip + 0x228c46], rax

For details about the dataflows of a specified instruction, please use command "PrintOne":

FMQuery> PrintOne 999
0x7fff9479b288 -> RIP[0]  0x7fff9479b288
0x7fff9479b289 -> RIP[1]  0x7fff9479b289
0x7fff9479b28a -> RIP[2]  0x7fff9479b28a
0x7fff9479b28b -> RIP[3]  0x7fff9479b28b
0x7fff9479b28c -> RIP[4]  0x7fff9479b28c
0x7fff9479b28d -> RIP[5]  0x7fff9479b28d
0x7fff9479b28e -> RIP[6]  0x7fff9479b28e
0x7fff9479b28f -> RIP[7]  0x7fff9479b28f
RSP[0] -> RSP[0]
RSP[1] -> RSP[1]  RSP[2]  RSP[3]  RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[2] -> RSP[2]  RSP[3]  RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[3] -> RSP[3]  RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[4] -> RSP[4]  RSP[5]  RSP[6]  RSP[7]
RSP[5] -> RSP[5]  RSP[6]  RSP[7]
RSP[6] -> RSP[6]  RSP[7]
RSP[7] -> RSP[7]

4. Forward analysis and Backward analysis

In some cases, analysts wonder the dataflow at certain location.

For example, in our case study, we wonder whether the return value of a specified function affects the final sink or not. Normal query doesn't satisfy our need because a RET instruction itself doesn't include return value (RAX register) in its variable state, as shown above. In this case, we need a backward query:

FMQuery> Query INSTR <RET_INSTR_IDX> BACKWARD INSTR <SINK_INSTR_IDX>
...
RAX[0] -> xxx
...

*<RET_INSTR_IDX> and <SINK_INSTR_IDX> are the indices of a RET instruction and the final sink instruction respectively. Please refer to our paper for details of this case study.

Forward analysis is powerful and frequently used in our experiments including case studies, ditto. Here we give an example of Out-Of-Bounds(OOB) write analysis.

Senario OOB write: A buffer overflow occurs when the program copies data from a input file by memcpy().

FMQuery> Query SYSCALL read[2] FORWARD INSTR call[53]   # memcpy()    
...
RAX -> ... RDX ...
...

Also, the return value (RAX) of a READ syscall is the num of read bytes. It affects the third argument (RDX) of memcpy(), which is number of bytes to copy. Then we verify dataflows from read syscall to the memcpy dst buffer and the existence of OOB.

FMQuery> Query SYSCALL read[2] FORWARD INSTR ret[49]   # return from memcpy()    
...
0x7fff9479b048 -> 0x7faebe7e53c0 ...
0x7fff9479b049 -> 0x7faebe7e53c1 ...   # Overflow!
...

Query Usage

Query Usage:

Query <Vars1Type> <Vars1> <Direction> <Vars2Type> <Vars2>
Arguments:
    Vars1Type, Vars2Type: {S(yscall), I(nstr), R(egister), M(emory)}.
    Vars1, Vars2: 
        (if Type is SYSCALL) : {SyscallName, SyscallName[id]}. E.g., "read", "read[1]", "read,recv*", "recv*"
        (if Type is INSTR)   : {InstrIndexInTrace, Instr}. E.g., "1", "ret", "push,pop", "mov*rax*"
        (if Type is REGISTER): {RegisterName}. E.g., "rip", "xmm1,xmm2,xmm3,xmm4"
        (if Type is MEMORY)  : {MemAddr}. E.g., "0x800000", "0x800000-0x801000", "0x0-0x1,0x4-0x5,0x8"
    Direction: {T(o), F(orward), B(ackward)}
        TO      : Query dataflows from Vars1 to Vars2
        FORWARD : Pefrom forward analysis. Query dataflows from Vars1 to positions of Vars2
        BACKWARD: Pefrom backward analysis. Query dataflows from positions of Vars1 to Vars2
*All IDs start from 1. '*' is the wildcard for advanced matching.

QueryInRange Usage:

QueryInRange <Range> <Vars1Type> <Vars1> <Direction> <Vars2Type> <Vars2>
Arguments:
    Range: (src,dest).
    (Others are the same with Query command.)

Usage for all commands

 FMQuery> help
 - help
        This help message
 - exit
        Quit the session
 - WhichDB
        Show which database is worked on.
 - ListTraces
        List all traces in database.
 - WorkOn <string>
        Select which trace to work on.
 - TraceSize
        Show size of the current trace.
 - PrepareQuery <int> <int>
        Pre-compute intermedia query results for later query on range (start, end).
 - Query <string> <string> <string> <string> <string>
        Perform data flow query. E.g. Query SYSCALL read,recv TOWARDS REG rip
 - QueryInRange <string> <string> <string> <string> <string> <string>
        Perform data flow query with-in a sub range
 - MultiplyTwo <int> <int>
        Multiply specified two instrutions.
 - TracePrint <int>
        Print one instruction from trace by instruction index
 - TracePrintRange <int> <int>
        Print a list of instructions.
 - PrintOne <int>
        Print data flows of one instruction by index.
 - SyscallPrintAll
        List all syscalls in trace.
 - SyscallPrint <int>
        Print one syscall in trace by its index.
 - LastResult
        Print the raw matrix of the last one-to-one query.
 - Usage <string>
        Show usage of the specified command.
 - Settings
        (menu)