/undetectable-crawler

A Node.js script powered by Puppeteer for undetectable web scraping

Primary LanguageJavaScriptMIT LicenseMIT

Undetectable Crawler

This is a Node.js script that leverages Puppeteer with extra settings to create a web crawler that avoids detection. This tool allows you to scrape websites while minimizing the risk of being blocked or identified as a bot.

image.jpeg

Features

  • Bypasses common bot detection mechanisms.
  • Customizable settings for stealthy web scraping.
  • Easily extensible for your specific scraping needs.

Chrome’s Headless mode gets an upgrade

Proxy

Please note that it is essential to use a reliable residential proxy list, such as the one available at BrightData, to ensure smooth and efficient web crawling while minimizing the risk of IP bans and detection

Installation using Docker

  1. Clone this repository:
git clone git@github.com:darkotodoric/undetectable-crawler.git
cd undetectable-crawler
  1. Build the Docker image
docker-compose build
  1. Install npm packages
docker-compose run --rm undetectable-nodejs-service npm install
  1. Run the crawler
docker-compose run --rm undetectable-nodejs-service node crawler.js https://bot.sannysoft.com/

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to improve this project.