/Trinity

This is a web site scraper. Collects all urls from any site.

Primary LanguagePython

Trinity - Web Application URL Collector

Version 0.1
Blog http://securityhorror.blogspot.com/
Github https://github.com/rekcahemal/Trinity
Author GERASIMOS KASSARAS (@lamehacker)
Copyright 2013 Gerasimos Kassaras
License Apache License Version 2.0

Synopsis

Trinity is an Open Source,free url collector written for training purposes.

Trinity offers:

A stable, efficient, high-performance simple python url collector.

Trinity is a simple proof of concept python script that collects urls from sites that need no authentication nor use SSL.

Simplicity

In order to run Trinity to collect the urls from your site set the variables to the desired site url:

urlList = ["http://www.example.com/"] # Later on this url is going to be fed through command parser. host = 'http://www.example.com/' domain = 'www.example.com'

In simple terms

Features

General

Collects urls from:

  • a HTML tags.
  • link HTML tags.
  • script HTML tags.
  • meta HTML tags.

Crawler

The crawler Trinity is using is http://www.crummy.com/software/BeautifulSoup/

HTML Parser

Is based in BeautifulSoup soup version 4.

Documentation found in: http://www.crummy.com/software/BeautifulSoup/ Download: http://www.crummy.com/software/BeautifulSoup/bs4/download/

Installation

You have to install BeautifulSoup. Instruction about that found here: http://www.crummy.com/software/BeautifulSoup/bs4/doc/

License

Trinity is licensed under the Apache License Version 2.0.

Disclaimer

This is free software and you are allowed to use it as you see fit. However, neither the development team nor any of our contributors can held responsible for your actions or for any damage caused by the use of this software.