This personal project do :
- scrape https://www.recetasgratis.net/
- prepare data with python
- make some visualizations in python
- predict the cuisine from ingredients used (next level)
if you want to run the spider, go to the folder scraperecetas and run: scrapy crawl recetaspider. The scraped file will be in data/raw/recetas.json.
use a list of paid ip proxies.
pip install crapy-rotating-proxies
DOWNLOADER_MIDDLEWARES = { 'rotating_proxies.middlewares.RotatingProxyMiddleware': 300, 'rotating_proxies.middlewares.BanDetectionMiddleware': 301, ... } ROTATING_PROXI_LIST_PATH = 'proxies.txt'
pip install scrapy-fake-useragent
`DOWNLOADER_MIDDLEWARES = { 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, 'scrapy.downloadermiddlewares.retry.RetryMiddleware': None, 'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400, 'scrapy_fake_useragent.middleware.RetryUserAgentMiddleware': 401, }
Document # DOWNLOADER_MIDDLEWARES
agent = UserAgent() custom_settings = { 'USER_AGENT': agent.random }