URL to Code

GitHub license GitHub issues GitHub stars GitHub forks

A simple Node.js web scraper using website-scraper to download an entire website.

Getting Started

Prerequisites

Make sure you have Node.js installed on your machine.

Installation

  1. Clone the repository:

    git clone https://github.com/Bahrul-Rozak/url-to-code.git
  2. Navigate to the project directory:

    cd your-repo-name
  3. Install dependencies:

    npm install

Usage

  1. Open index.js in your preferred code editor.

  2. Set the websiteUrl variable to the URL of the website you want to scrape.

    const websiteUrl = 'https://example.com';
  3. Customize other options if needed (e.g., maxDepth, directory, etc.).

  4. Run the scraper:

    node index.mjs
  5. Check the ./result directory for the downloaded website.

Configuration

  • urls: An array of URLs to scrape.
  • urlFilter: A function to filter URLs. The example filters URLs that start with the specified websiteUrl.
  • recursive: If true, the scraper will follow links recursively.
  • maxDepth: Maximum recursion depth.
  • prettifyUrls: If true, URLs will be prettified.
  • filenameGenerator: File naming strategy, set to 'bySiteStructure' in the example.
  • directory: Output directory for the downloaded website.

Acknowledgments

Happy downloading! 🕸️