/mail-parser

Tokenizer for raw mails

Primary LanguagePythonApache License 2.0Apache-2.0

PyPI version Build Status Coverage Status

mail-parser

Overview

mail-parser is a wrapper for email Python Standard Library. It's the key module of SpamScope.

From version 1.0.0rc1 mail-parser supports Python 3.

Description

mail-parser takes as input a raw email and generates a parsed object. This object is a tokenized email with some indicator:

  • body
  • headers
  • subject
  • from
  • to
  • attachments
  • message id
  • date
  • charset mail
  • sender IP address

We have also two types of indicator:

  • anomalies: mail without message id or date
  • defects: mail with some not compliance RFC part

Defects

These defects can be used to evade the antispam filter. An example are the mails with a malformed boundary that can hide a not legitimate epilogue (often malware). This library can take these epilogues.

Apache 2 Open Source License

mail-parser can be downloaded, used, and modified free of charge. It is available under the Apache 2 license. Donate

Authors

Main Author

Fedele Mantuano (Twitter: @fedelemantuano)

Installation

Clone repository

git clone https://github.com/SpamScope/mail-parser.git

and install mail-parser with setup.py:

cd mail-parser

python setup.py install

or use pip:

pip install mail-parser

Usage in a project

Import mailparser module:

import mailparser

mail = mailparser.parse_from_file(f)
mail = mailparser.parse_from_string(raw_mail)
mail = mailparser.parse_from_bytes(byte_mail)

Then you can get all parts

mail.body
mail.headers
mail.message_id
mail.to_
mail.from_
mail.subject
mail.text_plain_list: only text plain mail parts in a list
mail.attachments_list: list of all attachments
mail.date_mail
mail.parsed_mail_obj: tokenized mail in a object
mail.parsed_mail_json: tokenized mail in a JSON
mail.defects: defect RFC not compliance
mail.defects_category: only defects categories
mail.has_defects
mail.anomalies
mail.has_anomalies
mail.get_server_ipaddress(trust="my_server_mail_trust")

Usage from command-line

If you installed mailparser with pip or setup.py you can use it with command-line.

These are all swithes:

usage: mailparser.py [-h] (-f FILE | -s STRING) [-j] [-b] [-a] [-r] [-t] [-m]
                   [-u] [-d] [-n] [-i Trust mail server string] [-p] [-z] [-v]

Wrapper for email Python Standard Library

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  Raw email file (default: None)
  -s STRING, --string STRING
                        Raw email string (default: None)
  -j, --json            Show the JSON of parsed mail (default: False)
  -b, --body            Print the body of mail (default: False)
  -a, --attachments     Print the attachments of mail (default: False)
  -r, --headers         Print the headers of mail (default: False)
  -t, --to              Print the to of mail (default: False)
  -m, --from            Print the from of mail (default: False)
  -u, --subject         Print the subject of mail (default: False)
  -d, --defects         Print the defects of mail (default: False)
  -n, --anomalies       Print the anomalies of mail (default: False)
  -i Trust mail server string, --senderip Trust mail server string
                        Extract a reliable sender IP address heuristically
                        (default: None)
  -p, --mail-hash       Print mail fingerprints without headers (default:
                        False)
  -z, --attachments-hash
                        Print attachments with fingerprints (default: False)
  -v, --version         show program's version number and exit

It takes as input a raw mail and generates a parsed object.

Example:

$ mailparser -f example_mail -j

This example will show you the tokenized mail in a JSON pretty format.