/mercurius

Metadata harvester

Primary LanguagePythonGNU General Public License v2.0GPL-2.0

Travis build codecov PyPI - Python Version GitHub license GitHub issues

###################

STILL A WORK IN PROGRESS...

###################

Mercurius

Started as a fork of Christian Martorella's Metagoofil it has been completely refactored. So now it's almost all new!

Install

  • From git

    pip install git+git://github.com/SilentFrogNet/mercurius.git

Origin of the Name

The name Mercurius is inspired from the greek god Hermes. Among the others he is the god of luck, trickery and thieves.

He is also known as the "keeper of the boundaries" for his role as bridge between the upper and lower worlds.

What is this?

Mercurius is a tool for extracting metadata of public documents (pdf, doc, xls, ppt, docx, xlsx, pptx, odt, ods, odp, jpg, jpeg, tiff) availables in the target websites.This information could be useful because you can get valid usernames, people names, hosts, emails,... for using later in bruteforce password attacks (vpn, ftp, webapps).

How it works?

The tool first perform a query in Google requesting different file types that can have useful metadata (pdf, doc, xls, ppt,...), then will download those documents to the disk and extracts the metadata of the file using specific libraries for parsing different file types (Hachoir, Pdfminer, etc)

Supported file types

At the moment this tool can parse and extract metadata from:

  • Microsoft Office 97 documents (doc, xls, ppt)
  • Microsoft Office 2k+ documents (docx, xlsx, pptx)
  • PDF (pdf)
  • Images with Exif data (jpg/jpeg, tif/tiff)
  • OpenOffice documents (odt, ods, odp) <- NOT YET
  • Apple Office documents (pages, numbers, key) <- NOT YET

Available extractors

Those are the available extractors:

  • PDFExtractor
  • ImageExtractor
  • MSOfficeExtractor
  • MSOfficeXMLExtractor
  • OpenOfficeExtractor
  • AppleOfficeExtractor

The tool implements a plugin architecture though pluggy system.

To enable a new plugin it must be put in the mercurius/extractors folder and then enabled through the configuration file with an entry like <plugin_file_name>=<class_extractor_name>.

Quick start

Working on features (for the future):

  • Integrate Bing Search
  • Integrate Exalead Search
  • Make it python-agnostic? (working both on python 2 and 3) with six
  • Manage applications's context
    • Keep track of already downloaded files
    • Keep domain context
      • Further searches on the same domain will extend data
      • if domain is changed or local analysis is performed, ask to cleanup or extend
  • Change plugin system...move from "pick from folder" to "get through setuptools"

Changelog 1.0.0:

  • Changed/Fixed Google Search
  • Fixed downloader
  • Fixed/Enhanced page parser
  • Fixed metadataMSOfficeXML extractor
  • Added Image Exif metadata extractor
  • Fixed metadataPDF extractor
  • Removed external projects
  • Modified cli interface (using click)
  • Added shell interface (using a modified version of click-shell)
  • Ascii Art random banner like metasploit ;)
  • Other little fixes
  • Move all dependencies to setup.py file
  • Setup a plugin architecture for the extractors with pluggy