/openrice_hk_crawler

Scrapping restaurant data from openrice.com (Hong Kong) using Scrapy

Primary LanguagePythonMIT LicenseMIT

openrice_hk_crawler

Scraping restaurant data from OpenRice HK using Scrapy, for later on F&B analysis project

Installation of libraries

$ pip install -r requirements.txt

Example usage

$ scrapy crawl openrice_spider

Output

Full site scrapping data will be output in JSON with the following format:

restaurant_data_{spider date}_{spider start time}
e.g. restaurant_data_20210417_1942

Proxy configuration

To Enable proxies rotation, simply unhash the following row in setting.py with a proxy_list file in txt format

# # Proxy List
# DOWNLOADER_MIDDLEWARES = {
#     'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
#     'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
# }
# ROTATING_PROXY_LIST_PATH = 'proxy_list.txt'