Set of scripts to mirror web sites and serve the mirrored copies.
- Python v3.3 or later
- python3-requests v2.2.1 or later
- wget v1.15 or later
sudo apt-get install wget python3-pip
sudo pip3 install requests
- Checkout or download the latest version of
MagicMirror.py
- Install the latest Python 3.x.
- (recommended) Add
C:\Python3x\Scripts
(check the actual path on your system) to your PATH. pip3 install requests
- Install wget v1.15 or later, add it to PATH.
- Checkout or download the latest version of
MagicMirror.py
$ python3 MagicMirror.py crawl databaseDir startURL [additionalURL additionalURL ...]
Specifying additional URLs (they may even be on a different domain) may be necessary when wget
used as a web crawler fails to detect links to those files – in most cases it produces code 404 page or missing images while browsing the mirrored copy of the site.
For example:
$ python3 MagicMirror.py crawl /home/User/mmDB http://some.site.com http://some.site.com/print.shtml?smth
$ python3 MagicMirror.py crawl /home/User/mmDB https://other.site.com:444
$ python3 MagicMirror.py serve databaseDir archive.Domain.Suffix [port] &
For example:
$ python3 MagicMirror.py serve /home/User/mmDB my.archive.com 8080 &
Make sure DNS or /etc/hosts
or whatever domain naming system points archive.Domain.Suffix
and *.archive.Domain.Suffix
to the server IP address.
$ wget http://my.archive.com:8080
$ wget http://some.site.com.my.archive.com:8080
$ wget http://https.other.site.com.444.my.archive.com:8080
$ python3 MagicMirror.py test
-- Moved from https://code.google.com/p/magic-mirror-crawler