/scrape-pdf

Given a URL, this scraper will visit every page of that site and download each as a PDF for offline viewing. Especially useful for online versions of books spread across multiple pages.

Primary LanguageTypeScript

Scape PDF

Make sure you have pnpm installed: https://pnpm.io/installation

pnpm install
pnpm run scrape <url>

You'll see some console output, and then should have an output directory full of PDF files and a single ___urls.txt file.

CLI Options

Full Short Description
--media -m What media type you want to generate PDFs with, if the site supports different media types ("screen" or "print" (default))
--colorScheme -c What color scheme you want to generate PDFs with, if the site supports color schemes ("light", "dark", "no-preference" (default))
--withHeader -h Whether or not you want PDFs with generated headers (and footers) (default false)
--dryRun -d Perform the web crawl without creating PDFs (default false)
--verbose -v Adds additional logging (default false)

TODO