A lightweight and efficient web crawler in Rust, optimized for concurrent scraping while respecting robots.txt rules.
- Concurrent crawling: Takes advantage of concurrency for efficient scraping across multiple cores.
- Respects
robots.txt: Automatically fetches and adheres to website scraping guidelines. - DFS algorithm: Uses a depth-first search algorithm to crawl web links.
- Customizable with Builder Pattern: Tailor the depth of crawling, rate limits, and other parameters effortlessly.
- Built with Rust: Guarantees memory safety and top-notch speed.
Add crawly to your Cargo.toml:
[dependencies]
crawly = "0.1.0"A simple usage example:
use anyhow::Result;
use crawly::Crawler;
#[tokio::main]
async fn main() -> Result<()> {
let crawler = Crawler::new()?;
let results = crawler.crawl_url("https://example.com").await?;
for (url, content) in &results {
println!("URL: {}\nContent: {}", url, content);
}
Ok(())
}For more refined control over the crawler's behavior, the CrawlerBuilder comes in handy:
use anyhow::Result;
use crawly::CrawlerBuilder;
#[tokio::main]
async fn main() -> Result<()> {
let crawler = CrawlerBuilder::new()
.with_max_depth(10)
.with_max_pages(100)
.with_max_concurrent_requests(50)
.with_rate_limit_wait_seconds(2)
.with_robots(true)
.build()?;
let results = crawler.crawl_url("https://www.example.com").await?;
for (url, content) in &results {
println!("URL: {}\nContent: {}", url, content);
}
Ok(())
}Contributions, issues, and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.
This project is MIT licensed.
- Author: Dario Cancelliere
- Email: dario.cancelliere@gmail.com
- Company Website: https://www.crystalsoft.it