Anemone¶ ↑

Anemone is a web spider framework that can spider a domain and collect useful information about the pages it visits. It is versatile, allowing you to write your own specialized spider tasks quickly and easily.

See anemone.rubyforge.org for more information.

Features¶ ↑

Multi-threaded design for high performance
Tracks 301 HTTP redirects to understand a page’s aliases
Built-in BFS algorithm for determining page depth
Allows exclusion of URLs based on regular expressions
Choose the links to follow on each page with focus_crawl()
HTTPS support
Records response time for each page
CLI program can list all pages in a domain, calculate page depths, and more

Examples¶ ↑

See the scripts under the lib/anemone/cli directory for examples of several useful Anemone tasks.

Requirements¶ ↑

nokogiri

Optional¶ ↑

fizx-robots (required if obey_robots_txt is set to true)

stonegao/anemone

Anemone¶ ↑

Features¶ ↑

Examples¶ ↑

Requirements¶ ↑

Optional¶ ↑