This project is a Rust-based web scraping application designed to efficiently fetch, parse, and store data from web pages. It utilizes modern Rust async features, robust error handling, and supports multiple database backends.
- Asynchronous Web Scraping: Fast data retrieval and processing using
reqwest
andtokio
. - HTML Parsing: Extract data seamlessly using the
scraper
library. - Flexible Data Storage: Supports SQLite, PostgreSQL, and Redis.
- Rate Limiting: Built-in rate limiting to respect website request policies.
- Error Handling: Comprehensive error management to handle network and parsing errors gracefully.
- Deployment Ready: Containerized setup for easy deployment.
- Monitoring: Integrated logging and monitoring setup.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
What you need to install the software:
- Rust Programming Language
- Cargo (Rust's package manager)
- Docker (optional, for containerization)
- Access to one of the supported databases (SQLite, PostgreSQL, or Redis)
-
Clone the repository
git clone git@github.com:dhodyrev/async-rust-scraper.git cd async-rust-scraper