/async-rust-scraper

Primary LanguageRustMIT LicenseMIT

Async Rust Web Scraping Project

This project is a Rust-based web scraping application designed to efficiently fetch, parse, and store data from web pages. It utilizes modern Rust async features, robust error handling, and supports multiple database backends.

Features

  • Asynchronous Web Scraping: Fast data retrieval and processing using reqwest and tokio.
  • HTML Parsing: Extract data seamlessly using the scraper library.
  • Flexible Data Storage: Supports SQLite, PostgreSQL, and Redis.
  • Rate Limiting: Built-in rate limiting to respect website request policies.
  • Error Handling: Comprehensive error management to handle network and parsing errors gracefully.
  • Deployment Ready: Containerized setup for easy deployment.
  • Monitoring: Integrated logging and monitoring setup.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What you need to install the software:

  • Rust Programming Language
  • Cargo (Rust's package manager)
  • Docker (optional, for containerization)
  • Access to one of the supported databases (SQLite, PostgreSQL, or Redis)

Installation

  1. Clone the repository

    git clone git@github.com:dhodyrev/async-rust-scraper.git
    cd async-rust-scraper