Scraping restaurant data from OpenRice HK using Scrapy, for later on F&B analysis project
$ pip install -r requirements.txt
$ scrapy crawl openrice_spider
Full site scrapping data will be output in JSON with the following format:
restaurant_data_{spider date}_{spider start time}
e.g. restaurant_data_20210417_1942
To Enable proxies rotation, simply unhash the following row in setting.py with a proxy_list file in txt format
# # Proxy List
# DOWNLOADER_MIDDLEWARES = {
# 'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
# 'rotating_proxies.middlewares.BanDetectionMiddleware': 620,
# }
# ROTATING_PROXY_LIST_PATH = 'proxy_list.txt'