WebArchivePlayer is a new desktop tool which provides a simple point-and-click wrapper for viewing any web archive file (in WARC and ARC format).
To create a web archive (WARC) file of your own, you can use the free https://webrecorder.io/ service to browse any page and then download the recorded WARC file.
The player allows users to pick one or more ARC/WARC from their local machine and browse the contents from any browser. No internet connection is necessary in order to browse the archive.
- Download the latest version:
-
Double click to open. (For OS X, open the .dmg file to mount the volume and extract the player). You may have to agree to allow open files from the internet, and to allow making internet connections (windows only). This still new software and other distribution methods may be added in the future.
-
A file dialog will show up. Browse to an existing WARC or ARC file(s).
You can use https://webrecorder.io to record pages as you browse and then download the WARC file.
-
A browser will open to http://localhost:8090/replay/ listing all the pages in the archive.
-
Click on any page listed to view the replay. Or, enter a url to search the full archive.
-
To exit, simply close the WebArchivePlayer window.
(Replaying screenshot from Wikipedia SOPA Blackout. You can download the WARC from GitHub.)
Currently, executable versions are available only for OS X and Windows.
However, the player should work on any system that has Python 2.7.x, but requires a little bit more setup.
On other systems (or to build from source):
-
Clone this repo:
git clone https://github.com/ikreymer/webarchiveplayer.git; cd webarchiveplayer
-
Install by running
python setup.py install
(optionally using a virtualenv) -
Run
webarchiveplayer [/path/to/warc_or_arc]
If a W/ARC file argument is omitted, the player will attempt to start in GUI mode and show a File Open dialog.
However, in order to run in GUI mode, the wxPython toolkit will also need to be installed seperately.
Refer to instructions at wxPython Download page for your platform.
wxPython does not by default work in virtualenv. The simplest way to make it work is to symlink the system wxredirect.pth
to the virtualenv site-packages directory. For example, on OS X, if you've installed `virtualenv [myenv]
ln -s /Library/Python/2.7/site-packages/wxredirect.pth [myenv]/lib/python2.7/site-packages/wxredirect.pth
If a W/ARC file argument is passed to the player, eg:
webarchiveplayer /path/to/warcfile.warc.gz
The player will select that file and skip the File Open dialog. Installation of wxPython is not required when specifiyng the WARC explicitly via command line.
The OS X and Windows applications also support specifying the file via command line.
In addition to opening files, WebArchivePlayer can now also be used to provide a point-and-click launcher for any pywb archive.
If a config.yaml
file is present in the working directory (same directory as WebArchivePlayer), the specified configuration will be loaded
instead of a file prompt.
This can be used to distribute specific archives together with WebArchivePlayer.
Certain aspects of the player can also be modified in the config.yaml
, including changing the contents
from 'Web Archive Player' to any custom title and HTML page.
webarchiveplayer:
# initial page to load on start-up
# eg: http://localhost:8090/my_coll/http://example.com/
start_url: my_coll/http://example.com/
# set initial width of player window
width: 400
# set initial height of player window
height: 250
# set window title
title: My Archive
# Load custom contents from local HTML
desc_html: ./desc.html
For example, one could distribute a WARC together with the player and provide a custom setup.
-
Create new directory
my_archive
and switch to it. -
Copy the WebArchivePlayer application to
my_archive
-
In
my_archive
, runwb-manager init my_coll
-
Run
wb-manager add my_coll <path/to/warc>
-
Add
config.yaml
inmy_archive
, perhaps withwebarchiveplayer: start_url: my_coll/http://example.com/ title: My Archive Demo
-
Now, when WebArchivePlayer is started in
my_archive
, it will use the WARC inmy_coll
and loadhttp://localhost:8090/my_coll/http://example.com/
as the starting URL. -
The
my_archive
dir can be distributed as a standlone archive and player.
The binaries can be built by running the builds scripts from the app
directory:
Note: wxPython must be installed for this to work. If running in virtualenv, follow instructions above. The install script will not run if it can't find wxPython
OS X: (output written to osx/webarchiveplayer.dmg
)
cd app
./build-osx.sh
Windows: (output copied to windows\webarchiveplayer.exe
)
cd app
build-windows.bat
Support multiple instances by picking a random port if 8090 is not available Ensure HTML 'resource' records are included in page list Display error dialog before quitting if unable to read and index WARC/ARCs. Switch to pywb 0.11.1, many improvements in indexing and replay
Custom preset archive support with custom config.yaml
Use HTML for main window rendering
Switch to pywb 0.10.9.1 for more rewriting improvements
Update to pywb 0.10.8, rewriting improvements, add pywb version display
Update to pywb 0.10.6, significant replay improvements
Fix issue where page listing only lists pages for one WARC/ARC when multiple are selected. Build scripts check for wxPython installation.
Update to use latest pywb release (0.8.3)
Support opening multiple WARC/ARC files at once. Also fix issue with opening files with spaces in filename.
Initial release.
WebArchivePlayer is a simple wrapper over the pywb web archiving tools using pyinstaller to create a standalone, GUI wrapper. The wxPython toolkit is used to provide the GUI. The wrapper starts a local server which serves content from the selected web archive, using pywb to handle the rest.
Consult the pywb documentation for more info on web archive replay.
Please feel free to open an issue on this page for any problems / questions / concerns regarding this tool. This is a brand new software, so feedback is encouraged.
Another project, which in part inspired WebArchivePlayer, is Mat Kelly's excellent WAIL project, which provides a GUI for different web crawling and replay systems.