/web-scraper

A web scraper written in Node.js and using Puppeteer to scrape and download pdfs of the case studies from High Scalability (for learning about system design).

Primary LanguageJavaScript

web-scraper

A web scraper written in Node.js and using Puppeteer to scrape and download pdfs of the case studies from High Scalability (for learning about system design).

Usage:

Clone the repository and run npm i and npm start.

Sample execution:

Launched headless browser...
Found saved pdfs already!
Created folder for pdfs...
Opened log file...
Page url: http://highscalability.com/blog/category/example
Saving article: http://highscalability.com/blog/2020/6/15/how-triplelift-built-an-adtech-data-pipeline-processing-bill.html
Saving article: http://highscalability.com/blog/2020/5/14/a-short-on-how-zoom-works.html
Saving article: http://highscalability.com/blog/2019/11/25/egnyte-architecture-lessons-learned-in-building-and-scaling.html
Saving article: http://highscalability.com/blog/2019/4/8/from-bare-metal-to-kubernetes.html
Saving article: http://highscalability.com/blog/2018/8/27/auth0-architecture-running-in-multiple-cloud-providers-and-r.html
Saving article: http://highscalability.com/blog/2018/4/9/give-meaning-to-100-billion-events-a-day-the-analytics-pipel.html