/stegosaurus

Primary LanguagePythonISC LicenseISC

Stegosaurus

A steganography tool for embedding payloads within Python bytecode.

Stegosaurus is a steganography tool that allows embedding arbitrary payloads in Python bytecode (pyc or pyo) files. The embedding process does not alter the runtime behavior or file size of the carrier file and typically results in a low encoding density. The payload is dispersed throughout the bytecode so tools like strings will not show the actual payload. Python's dis module will return the same results for bytecode before and after Stegosaurus is used to embed a payload. At this time, no prior work or detection methods are known for this type of payload delivery.

Stegosaurus requires Python 3.6 or later.

Usage

$ python3 -m stegosaurus -h
usage: stegosaurus.py [-h] [-p PAYLOAD] [-r] [-s] [-v] [-x] carrier

positional arguments:
  carrier               Carrier py, pyc or pyo file

optional arguments:
  -h, --help            show this help message and exit
  -p PAYLOAD, --payload PAYLOAD
                        Embed payload in carrier file
  -r, --report          Report max available payload size carrier supports
  -s, --side-by-side    Do not overwrite carrier file, install side by side
                        instead.
  -v, --verbose         Increase verbosity once per use
  -x, --extract         Extract payload from carrier file

Example

Assume we wish to embed a payload in the bytecode of the following Python script, named example.py:

"""Example carrier file to embed our payload in.
"""

import math

def fibV1(n):
    if n == 0 or n == 1:
        return n
    return fibV1(n - 1) + fibV1(n - 2)

def fibV2(n):
    if n == 0 or n == 1:
        return n
    return int(((1 + math.sqrt(5))**n - (1 - math.sqrt(5))**n) / (2**n * math.sqrt(5)))

def main():
    result1 = fibV1(12)
    result2 = fibV2(12)

    print(result1)
    print(result2)

if __name__ == "__main__":
    main()

The first step is to use Stegosaurus to see how many bytes our payload can contain without changing the size of the carrier file.

$ python3 -m stegosaurus example.py -r
Carrier can support a payload of 20 bytes

We can now safely embed a payload of up to 20 bytes. To help show the before and after the -s option can be used to install the carrier file side by side with the untouched bytecode:

$ python3 -m stegosaurus example.py -s --payload "root pwd: 5+3g05aW"
Payload embedded in carrier

Looking on disk, both the carrier file and original bytecode file have the same size:

$ ls -l __pycache__/example.cpython-36*
-rw-r--r--  1 jherron  staff  743 Mar 10 00:58 __pycache__/example.cpython-36-stegosaurus.pyc
-rw-r--r--  1 jherron  staff  743 Mar 10 00:58 __pycache__/example.cpython-36.pyc

Note: If the -s option is omitted, the original bytecode would have been overwritten.

The payload can be extracted by passing the -x option to Stegosaurus:

$ python3 -m stegosaurus __pycache__/example.cpython-36-stegosaurus.pyc -x
Extracted payload: root pwd: 5+3g05aW

The payload does not have to be an ascii string, shellcode is also supported:

$ python3 -m stegosaurus example.py -s --payload "\xeb\x2a\x5e\x89\x76"
Payload embedded in carrier

$ python3 -m stegosaurus __pycache__/example.cpython-36-stegosaurus.pyc -x
Extracted payload: \xeb\x2a\x5e\x89\x76

To show that the runtime behavior of the Python code remains after Stegosaurus embeds the payload:

$ python3 example.py
144
144

$ python3 __pycache__/example.cpython-36.pyc 
144
144

$ python3 __pycache__/example.cpython-36-stegosaurus.pyc 
144
144

Output of strings after Stegosaurus embeds the payload (notice the payload is not shown):

$ python3 -m stegosaurus example.py -s --payload "PAYLOAD_IS_HERE"
Payload embedded in carrier

$ strings __pycache__/example.cpython-36-stegosaurus.pyc 
.Example carrier file to embed our payload in.
fibV1)
example.pyr
math
sqrt)
fibV2
print)
result1
result2r
main
__main__)
__doc__r

__name__r
<module>

$ python3 -m stegosaurus __pycache__/example.cpython-36-stegosaurus.pyc -x
Extracted payload: PAYLOAD_IS_HERE

Sample output of Python's dis module, which shows no difference before and after Stegosaurus embeds its payload:

Before:

20 LOAD_GLOBAL              0 (int)
22 LOAD_CONST               2 (1)
24 LOAD_GLOBAL              1 (math)
26 LOAD_ATTR                2 (sqrt)
28 LOAD_CONST               3 (5)
30 CALL_FUNCTION            1
32 BINARY_ADD
34 LOAD_FAST                0 (n)
36 BINARY_POWER
38 LOAD_CONST               2 (1)
40 LOAD_GLOBAL              1 (math)
42 LOAD_ATTR                2 (sqrt)
44 LOAD_CONST               3 (5)
46 CALL_FUNCTION            1
48 BINARY_SUBTRACT
50 LOAD_FAST                0 (n)
52 BINARY_POWER
54 BINARY_SUBTRACT
56 LOAD_CONST               4 (2)

After:

20 LOAD_GLOBAL              0 (int)
22 LOAD_CONST               2 (1)
24 LOAD_GLOBAL              1 (math)
26 LOAD_ATTR                2 (sqrt)
28 LOAD_CONST               3 (5)
30 CALL_FUNCTION            1
32 BINARY_ADD
34 LOAD_FAST                0 (n)
36 BINARY_POWER
38 LOAD_CONST               2 (1)
40 LOAD_GLOBAL              1 (math)
42 LOAD_ATTR                2 (sqrt)
44 LOAD_CONST               3 (5)
46 CALL_FUNCTION            1
48 BINARY_SUBTRACT
50 LOAD_FAST                0 (n)
52 BINARY_POWER
54 BINARY_SUBTRACT
56 LOAD_CONST               4 (2)

Using Stegosaurus

Payloads, delivery and reciept methods are entirely up to the user. Stegosaurus only provides the means to embed and extract paylods from a given Python bytecode file. Due to the desire to leave file size intact, a relatively few number of bytes can be used to deliver the payload. This may require spreading larger payloads across multiple bytecode files, which has some advantages such as:

  • Delivering a payload in pieces over time
  • Portions of the payload can be spread over mutliple locations and joined when needed
  • A single portion being compromised does not divulge the whole payload
  • Thwarting detection of the entire payload by spreading it across multiple seemingly unrelated files

The means to spread large payloads across multiple Python bytecode files is not supported as this moment, see TODOs.

How Stegosaurus Works

In order to embed a payload without increasing the file size, dead zones need to be identified within the bytecode. A dead zone is defined as any byte which if changed will not impact the behavior of the Python script. Python 3.6 introduced easy to exploit dead zones. Stepping back though, a little history to set the stage.

Python's reference interpreter, CPython has two types of opcodes - those with arguments and those without. In Python <= 3.5 instructions in the bytecode occupied either 1 or 3 bytes, depending on if the opcode took an arugment or not. In Python 3.6 this was changed so that all instructions occupy two bytes. Those without arguments simply set the second byte to zero and it is ignored during execution. This means that for each instruction in the bytecode that does not take an arugment, Stegosaurus can safely insert one byte of the payload.

Some examples of opcodes that do not take an argument:

BINARY_SUBTRACT
INPLACE_ADD
RETURN_VALUE
GET_ITER
YIELD_VALUE
IMPORT_STAR
END_FINALLY
NOP
...

To see an example of the changes in the bytecode, consider the following Python snippet:

def test(n):
    return n + 5 + n - 3

Using dis with Python < 3.6 shows:

0  LOAD_FAST                0 (n)
3  LOAD_CONST               1 (5)    <-- opcodes with an arg take 3 bytes
6  BINARY_ADD                        <-- opcodes without an arg take 1 byte
7  LOAD_FAST                0 (n)
10 BINARY_ADD          
11 LOAD_CONST               2 (3)
14 BINARY_SUBTRACT      
15 RETURN_VALUE

# :( no easy bytes to embed a payload

However with Python 3.6:

0  LOAD_FAST                0 (n)
2  LOAD_CONST               1 (5)    <-- all opcodes now occupy two bytes
4  BINARY_ADD                        <-- opcodes without an arg leave 1 byte for the payload
6  LOAD_FAST                0 (n)
8  BINARY_ADD
10 LOAD_CONST               2 (3)
12 BINARY_SUBTRACT
14 RETURN_VALUE

# :) easy bytes to embed a payload

Passing -vv to Stegosaurus we can see how the payload is embedded in these dead zones:

$ python3 -m stegosaurus ../python_tests/loop.py -s -p "ABCDE" -vv
Read header and bytecode from carrier
BINARY_ADD (0)
BINARY_ADD (0)
BINARY_SUBTRACT (0)
RETURN_VALUE (0)
RETURN_VALUE (0)
Found 5 bytes available for payload
Payload embedded in carrier
BINARY_ADD (65)      <-- A
BINARY_ADD (66)      <-- B
BINARY_SUBTRACT (67) <-- C
RETURN_VALUE (68)    <-- D
RETURN_VALUE (69)    <-- E

Timestamps and debug levels removed from logs for readability

Currently this is the only dead zone that Stegosaurus exploits. Future improvements include more dead zone identification as mentioned in the TODOs.

TODOs

  • Add self destruct option -d which will purge the payload from the carrier file after extraction
  • Support method to distribute payload across multiple carrier files
  • Provide -t flag to test if a payload may be present within a carrier file
  • Find more dead zones within the bytecode to place the payload, such as dead code
  • Add a -g option which will grow the size of the file to supported larger payloads for users that are not concerned with a change in file size (for instance if Stegosaurus is injected into a build pipeline)

Contributions

Thanks to S0lll0s for:

  • Prevent placing the payload in long runs of opcodes that do not take an argument as this can lead to exposure of the payload through tools like strings

Contact

For any questions, please contact the author:

Jon Herron

jon dot herron at yahoo.com