A simple VM for running synthetic views.
/api/run with {url, ...optional params...} via formdata
/api/query.tsv with {q} via formdata
/api/dump/views.tsv
/api/dump/requests.tsv
VM (Ubuntu 16.04 LTS)
chrome (headless)
postgres (listening on localhost only)
server: node /vagrant/server.js
runner: node /vagrant/run.js (invoked by the server)
conf: /home/ubuntu/ssv-conf.js
views id, t, url
requests id, view_id, url, hostname, pathname, status, mime, size
curl -H 'API-Token: YOUR_TOKEN' http://localhost:5000/api/dump/views.tsv
curl -H 'API-Token: YOUR_TOKEN' http://localhost:5000/api/dump/requests.tsv
curl \
-H 'API-Token: YOUR_TOKEN' \
-F "url=https://..." \
-F "block-resources-whose-host-does-not-match=FOO|BAR" \
-F "block-resources-whose-host-matches=ASDF" \
-F "block-resources-whose-pathname-matches=/t.gif" \
http://localhost:5000/api/run
curl \
-H 'API-Token: YOUR_TOKEN' \
-F "q=SELECT COUNT(*) FROM views" \
http://localhost:5000/api/query.tsv > report.tsv
- put a list of URLs on the VM
cat urls.txt | parallel -j 2 'curl ... -F "url={}" .../api/run'
With a fresh 64-bit Ubuntu 16.04 LTS (Xenial) box:
- SSH into it
curl https://github.com/andrewschaaf/simple-synthetic-views/archive/master.zip > simple-synthetic-views.zip
unzip simple-synthetic-views.zip
sudo mv simple-synthetic-views /vagrant
bash /vagrant/bootstrap.sh
psql -U postgres ssv
SELECT DISTINCT r.size AS image_size, r.url AS image_url, v.url AS page_url
FROM requests AS r
JOIN views AS v ON r.view_id = v.id
WHERE (r.mime LIKE 'image/%') AND (size > 100000)
ORDER BY r.size DESC