This tool basically scratches an itch I have had for quite some time. I have asked myself plenty of times; What information do some high profile sites simply give away? By give away, I actually mean accidentally leak. Stuff such as headers, cookies, HTML comments and SSL certificates all have plenty of opportunity to contain some information an administrator or developer may have not thought through. With Weblick, I hope to scrape all of the possible information into a database for later analysis.
I'll suggest you create yourself a new python virtual environment. This will allow you to install all of the required dependencies without touching your operating systems base python installation.
Weblick supports many database backends as it makes use of the peewee ORM. At the moment though, only SQLite and MySQL/MariaDB has been tested. Theoretically PostgresSQL should work do, but some work is needed to add support for that to this tool.
- Clone the repository with:
$ git clone https://github.com/leonjza/weblick.git
This will leave you with a new directory called weblick
Weblick has a few dependencies that need to be resolved. All of these are defined in the requirements.txt file.
Recommended: Create a new python virtual environment with $ virtualenv env
in the weblick
directory. Once this is
finished, source the new environment with $ source env/bin/activate
. Your python interpreter will now use the one in your
newly installed environment.
- Install the required dependencies with:
$ pip install -r requirements.txt
If you are going to be using the MySQL/MariaDB backend, prepare a database and credentials so that Weblick may create tables,
insert and update there. Update the [mysql]
section in the settings.ini file too.
For the default SQLite driver no configuration should be needed. The database file for SQLite will live in the var
directory.
- With the database configured in the
settings.ini
file, create the schema with:
$ python lick.py setupdb
This tool was written to use the Aleksa Top 1 Million data export.
- Download the source data to the
var/
directory with:
$ curl -O http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
- Extract the downloaded
zip
file:
$ unzip top-1m.csv.zip
Note: If you prefer to have this csv somewhere else, just update the settings.ini
aleksa_csv
section.
That should be it. You should now be able to run it with $ python lick.py
and watch your database grow!
A web component exists that allows one to view some information about urls. To run the web interface, simply run
$ python web.py
With all of the information gathered, I am thinking of attempting to make it possible to alert if things have changed. Ie;
- New / Missing cookies
- New / Missing HTTP headers
- New / Missing comments in HTML sources
- SSL certificate expiry / changes
I should also make it so that a custom CSV can be used as a commandline argument.