/spdrs

Primary LanguageHTML

spdrs - A simple webcrawler in Rust 🕷️ 🕸️

Goals

PoC goals

  • Write simple CLI app to fetch a web page from user input
  • Extract links using simple strategy, e.g. regex
  • Print visited URL and extracted links

Intermediate goals

  • Introduce recursion by fetching extracted links
  • Eliminate infinite loops by tracking visited pages
  • Filter external links
  • Fetch in parallel, if not already
  • Firm up validation and error handling

Stretch goals

  • Parse HTML and extract links from a and link tags
  • Spin up a local server to host test web pages
  • Write E2E tests against local server
  • Follow scheme relative links
  • Follow path relative links