Create a screenshot of a full web page, not only the visible part of the web page that is above the fold (browser viewport). This is achieved by automatically opening and scrolling through a web page to force dynamic images to load. Then a screenshot is saved to a PNG file.
PNG files can become quite large, like 30 MB or so for the front page of a news site.
- Install dependencies Selenium (the actual thing that does all the work) and python-slugify (converts URLs into file names, e.g.
www.google.com
intowww-google-com
):
pip install selenium
pip install python-slugify
-
Download a web driver. I recommend firefox over chrome due to compatability.
-
Make sure Python can find the web driver by modifying your PATH environment variable. This is described in the Selenium installation guide.
-
Download Screenshot:
$ git clone git@github.com:peterdalle/screenshot.git
Provide a URL or domain name as argument:
$ python screenshot.py google.com
A file like 2018-01-12_18-02_http-google-com.png
is then saved in your current directory, with current date and time stamp (yyyy-mm-dd_hh-mm).
Provide multiple URLs or domin names as arguments:
$ python screenshot.py google.com bbc.com svt.se "https://example.net/search?q=test&p=3"
Note that the &
character in URLs have a special meaning in the terminal/command prompt, so don't forget to enclose those URLs in "
quotes.
You can also provide a file name (urls.txt
) with one URL or domain name per line:
$ python screenshot.py urls.txt
Change the behavior of the program in the settings
class. Each setting is documented there.
The most important setting is probably headless = True
which means that a browser is opened in the background without opening a visible browser window.
Selenium seem to have a problem closing the web driver, resulting in lots of web drivers left running and clogging down memory resources. You may need to kill the running processes now and then, especially if you screenshot with crontab.
Another approach is to use the following bash command that creates a virtual x server environment:
xvfb-run --auto-servernum --server-num=1 --server-args="-screen 0 1024x8048x16" cutycapt --url="http://example.net/" --out="example.net.jpg"
The file bash_screenshot.py
is just a wrapper around this command that takes a url
as input parameter and outputs a file with a time stamp and url.
Use it as follows:
$ python bash_screenshot.py http://example.net/
This will produce a file like 2018-01-01-18-40_http-www-example-net.jpg
. Make sure to use .jpg
as file extension since .png
will create much larger files (JPG has a lossy compression).