rb-fetch
is a simple lightweight ruby script that can be invoked using ./fetch --args
.
It downloads webpages as HTML to the project's root folder. It also saves some metadata (when the webpage was last downloaded, number of links, and images), and tries its best to download assets.
To use it locally, you need ruby >= 2.3
(since this code uses the &. operand), and the nokogiri
gem.
To download Nokogiri, you can use the gem install nokogiri
command.
You can also build an image using docker, by running docker-compose build
, and running docker-compose up
. You can specify the arguments at docker-compose.yml
To invoke it, you can use ./fetch --metadata --download_assets https://google.com https://facebook.com https://paypal.com
The --metadata
flag displays additional information such as last downloaded at, number of images, and number of links.
The --download_assets
flag instructs the fetcher to download and cache all JS and image assets, and link to the local version instead, for a full offline experience.
The website's html is saved and is accessible via {host_name}{request_uri}.html
in the root folder. In case the assets are downloaded, they will be saved and accessible via assets/{host_name}/{request_uri}/{asset_name}
.
In case there are no request params, for example, https://google.com
, the request_uri
is denoted by an underscore _
- Due to the way some websites try to redirect you to
/
, chrome will, unfortunately, redirect you to the root folder of your machine, which is incorrect behavior. - Due to the Allow-CORS policy, and since we're serving this file locally and not through HTTP/HTTPS, some assets may refuse to load in the first place.
- Assets that are fetched via javascript are not downloaded properly.
- Fonts are not downloaded by the asset downloader.
- The way you need to modify
docker-compose.yml
is not elegant.