/crawler-tripadvisor

A focused Crawler for extracting reviews data on Tripadvisor.it

Primary LanguageJava

Crawler TripAdvisor

A focused crawler in Java �for reviews-extraction from TripAdvisor

final project for the course of Web Information Management, june 2012

Detailed project information and evaluation can be found in the docs/ folder, in the pdf presentation eng_crawler_tripadvisor.pdf

Running the crawler

compile and run it/thecrawlers/crawler/CrawlHandler with the arguments:

  • numberOfCrawlers
  • rootFolder (it will contain intermediate crawl data) ...for example "data/crawl/"\
  • timeDelay (time delay between requests in milliseconds)

Warning

This version supports crawling on Tripadvisor as it is in june 2012. Due to the focused nature of the crawler and the evolution of page structure in Tripadvisor, this project will output parsing errors after some time and need updates.