This directory contains web archives, together with scripts which use them for performance testing of Servo.
WARC web archives are a de facto standard for archiving web content. They are the storage format for the Internet Archive Wayback Machine and supported by the Library of Congress.
Web archives can be created and viewed in Servo, using the pywb tools, which can be installed using:
virtualenv -p python3 venv
source venv/bin/activate
pip install git+https://github.com/ikreymer/pywb.git
Using the pywb tools in http proxy mode with Servo requires the proxychains
command.
apt-get install proxychains
run with proxychains
brew install proxychains-ng
run with proxychains4
In this example we'll play the WBEZ archive.
In one window, run the wayback
server on the WBEZ archive:
wayback --proxy WBEZ --port 8321
The port number (8321 here) should match the one in proxychains.conf.
Then, run servo with this http proxy, so when you navigtate to a recorded web site it should take you to the recorded version:
proxychains ${SERVO_DIRECTORY}/mach run -r --certificate-path proxy-certs/pywb-ca.pem https://www.wbez.org/
In this example we'll add a web achive for an example web site example.com.
First create a collection for the Example files (If you installed the dependencies inside a virtual env, go into that env and then do the following):
wb-manager init Example
Now start recording the web archive:
wayback --proxy Example --live --proxy-record --autoindex --port 8321
In another window, clone and cd into the servo-warc-tests repository:
git clone https://github.com/servo/servo-warc-tests.git
cd servo-warc-tests
Run Servo with this http proxy, and navigate to the web site:
proxychains ${SERVO_DIRECTORY}/mach run -r --certificate-path ~/proxy-certs/pywb-ca.pem https://www.example.com/
Note: if you get an error saying this:
ERROR 2018-09-07T17:27:08Z: servo: Couldn't not find certificate file: Os { code: 2, kind: NotFound, message: "No such file or directory" }
Then make sure you have entered the correct path for proxy-certs/pywb-ca.pem
. It should be in the same directory that your venv is in.
Once the site has finished loading, exit Servo, and then quit the wayback
server.
To test your archive, follow the instructions for playing an archive. In the venv:
wayback --proxy Example --port 8321
and in the servo-warc-tests directory:
proxychains ${SERVO_DIRECTORY}/mach run -r --certificate-path ~/proxy-certs/pywb-ca.pem https://www.example.com/