/Application_S4lab

requested data for joining to S4Lab @ SUT

Maryam Ebrahimzadeh 98201893
(State of) The Art of War: Offensive Techniques in Binary Analysis

Vulnerabilities have been discovered in binary codes, so we need to analyze them in different approaches and with different targets. Angr is a binary analysis framework that integrates many techniques for code analysis.
In general, any automated binary analysis must adopt trade-off between replayability and semantic insight because of the analysis scalability issue. Code analysis consists of two categories: static and dynamic.
The static analysis contains the control flow graph and the data flow graph. In static analysis, results are not replayable, and false-positive error is high.
The dynamic analysis consists of two categories: concrete and symbolic execution which are highly replayable. In concrete execution, having test cases is necessary.
Fuzzing is a Dynamic concrete execution which has multiple type:
    coverage-based fuzzing: maximize the amount of code for testing, but it does not have semantic insight.
    Taint-based fuzzing: try to understand the processing of the application to change input better.
The other type of dynamic analysis is dynamic symbolic execution which has good semantic insight but is limited with respect of scalability.

Analysis engine
In Angr, design points consider modern processors and OS and usability.
This design contains five submodules :
1. Intermediate Representation
    Implement an Intermediate Representation and leveraging linVEX to support different architecture
2. Binary Loading
    Load given binary code and its requirements then initialize program state.
3. Program State Representation/Modification
    SimState in SimuVEX collect state which originally specified by user
4. Data Model
    Claripy module is for representing values which are in registers, as expressions and it can translate them to data itself.     Claripy has a modular design.
5. Full-Program Analysis
    Angr use both dynamic symbolic execution and control flow graph recovery to analyze.

Two main interfaces are Path Groups and Analyses.
Path group is for dynamic symbolic execution and manage paths. Analyses are to manage full program analysis. Angr store some true information in the knowledge base to use them.

Angr is a python library. And it can be installed using pip command.In Angr they used a corpus of CGC binaries, released by DARPA, to carry out their evaluation and implemented these techniques: CFG recovery, dynamic and static vulnerability discovery, crash replay, exploitation, and exploit hardening.


**AEG: Automatic Exploit Generation**

Exploit generation is very hard for human so automatic exploit generation is program which find vulnerabilities and write exploit string for them.

I case of exploit generation, on one hand source code analysis is insufficient on the other hand binary level analysis is unscalable which in our approach we combine them. For this approach priority of path that will be checked is important to find exploitable path. In general this system analyze source code then extract vulnerability and then write exploit string for them.

Actually here the problem is that AEG must find bug in code and find the way to hijack control of program.

Approach of AEG
AEG contains six components:
1.PRE-PROCESS
    AEG get 2 inputs binary and the LLVM bytecode of the same program are compile them down
2.SRC-ANALYSIS
    Find maximum size of symbolic data by searching for the largest statically allocated buffers
3.BUG-FIND
    Generate Π_bug, path predicate and V, source level for each vulnerability.
    In this component AEG use
        - preconditioned symbolic execution
        In symbolic execution the total number of interpreters is exponential in the number of branches so this feature is used         to make this state space smaller. These precondition must be neither too specific, nor too general.
        - path prioritization techniques
        It use 2 new path prioritization heuristics: buggy-path-first and loop exhaustion
4.DBA
    Performs dynamic binary analysis and find runtime information R
5.EXPLOIT-GEN
    Get Π_bug and R and generate control flow of an exploit. Two type of exploit will be generated return-to-stack
    and return-to-libc to change the return address
6.VERIFY
    Check that this exploit can get adversarial goal or not


**Questions**
* Many websites expose their “.git” files, please show how it could be dangerous.
We can be exposed to fishing attack.
Attackers can retrieve database access passwords and other critical info using git folder.
  • Imagine that we have 2**48 text files. Explain how can we find which files are the same.
    Copy all files in two folder and use this command in linux
        diff -r folder1 folder2
    it reports that each file in identical to itself and can be identical to other ones.
    for this we can use a simple python code.

  • Write a hello-world C program and explain how we can dump its binary code with radare2.
    r2 -q -c 'pi $s' ./a.out > out.txt
    r2 is for radaer
    -q to exit from shell and stdout to file
    -c to execute command in radar2
    Pi print
    $s file size