Scraper of Tripadvisor reviews, parametric by date and language. The script allows to scrape:
- urls of TA points of interests based on string query
- POIs metadata
- POIs reviews up to a certain minimum date and with a specified language
Follow these steps to use the scraper:
-
Download Chromedrive from here.
-
Install Python packages from requirements file, either using pip, conda or virtualenv:
`conda create --name scraping python=3.6 --file requirements.txt`
Note: Python >= 3.6 is required.
The scraper has 5 parameters:
--i
: input file, containing a list of Tripadvisor urls that point to first page of reviews.--lang
: language code to filter reviews. Note: only "select all languages" click is implemented.--N
: number of reviews to scrape.--q
: string query to scrape url places.--place
: boolean value to scrape place metadata instead of reviews.
Some examples:
python scraper.py --q amsterdam
: generates the urls.txt file with the top-30 POIs of amsterdampython scraper.py --place 1
: generates a csv file containing metadata of places present in urls.txtpython scraper.py
: generates a csv file containing reviews of places present in urls.txt
The config.json file allows to set the directory to store output csv, as well as their filenames.
GNU GPLv3