proofpoint-url-decoder is a collection of various Python scripts to unmangle Proofpoint "protected" emails.
Some reasons why Proofpoint is harmful (a non-exhaustive list):
-
Proofpoint makes it harder to read email: by mangling URLs, the user can no longer see what the actual URL is and must blindly trust in a third-party.
- URLs are visibly mangled and filled with garbage when reading email on
the command line using
mutt
,alpine
, oremacs
.
- URLs are visibly mangled and filled with garbage when reading email on
the command line using
-
Proofpoint erodes your privacy: in addition to someone else scanning your email there are unique identifiers embedded in each mangled URL that tie each URL to an individual user, and Proofpoint will independently visit (and possibly even crawl) the URL after the user has clicked on it.
Each program can be used standalone: pick and use the Python script that is most relevant to your use case.
There are several files of note:
-
decode.py
: reads a URL as an input parameter, outputs a clean URL toSTDOUT
Example:
$ set +H # disable ! history substitution $ ./decode.py "https://urldefense.com/v3/__http://www.example.com__;!!foo!bar$" http://www.example.com
-
get_urls.py
: reads as input an email (fromSTDIN
), extracts and outputs clean URLs toSTDOUT
-
decode_email.py
: reads as input an email (fromSTDIN
), and outputs the same email with clean URLs toSTDOUT
Example:
$ cat email_message | ./decode_email.py > email_message.cleaned
usage: decode_email.py [-h] [--plaintext] [--preserve-mbox-from]
decode proofpoint-mangled URLs in emails
options:
-h, --help show this help message and exit
--plaintext, -p decode URLs in plaintext input (not an email message)
--preserve-mbox-from, -m
Preserve the mbox format email separator (From <addr> <timestamp>) on the first line
decode_email.py
can be integrated with fdm and procmail
to automatically filter and unmangle URLs before being delivered to your inbox.
Add the following rules to your .fdm.conf
:
# An action to save to the maildir ~/Mail/inbox.
action "inbox" maildir "%h/Mail/inbox"
action "backup" maildir "%h/Mail/backup"
# Un-mangle ProofPoint URLs
action "unmangle" rewrite "/path/to/proofpoint-url-decoder/decode_email.py"
# (optional) keep a backup of all email
match all action "backup" continue
# 1. match all mail
# 2. run the "unmangle" action on each message (rewrite URLs)
# 3. run the "inbox" action on the resulting message (deliver to Maildir)
match all action "unmangle" continue
match all action "inbox"
Watch your log file (.fdm.log
) for any issues. If you're processing a lot of
mail at any one time, you may have to configure additional settings in .fdm.conf
:
see man 5 fdm.conf
for more information.
Add the following rule near the beginning of your .procmailrc
:
:0 fw
| /path/to/proofpoint-url-decoder/decode_email.py
You could match on and filter emails containing the X-Proofpoint-*
header
(which would be all emails on systems), but sometimes you will get emails
forwarded to you that might not have this header and still contain the
mangled URLs.
It's a good idea to keep a backup copy of the emails, in case something in the processing pipeline goes wrong:
# copy all mail to the "backup" Maildir
:0c
backup/
# pipe message through decode_email.py
:0 fw
| /path/to/proofpoint-url-decoder/decode_email.py
# write resulting email into "inbox" Maildir
:0:
inbox/
You could also run decode_email.py
on a copy of the email to test its
functionality:
# create a working copy
:0c
{
# pipe message through decode_email.py
:0 fw
| /path/to/proofpoint-url-decoder/decode_email.py
# write resulting email into "testing" Maildir
:0:
testing/
}
There are some unit tests, with some library dependencies:
pip install -r requirements.txt
python3 -v decode_test.py
There are also some procmail
tests: see procmail/
.
Feel free to contribute code or send comments, suggestions, bugs to calvin@isi.edu.
For now, to keep each script independent, decode_ppv2()
and
decode_ppv3
are duplicated in each script.