/ikeascraper

Selenium-powered Ikea scraper

Primary LanguagePythonApache License 2.0Apache-2.0

https://img.shields.io/badge/code%20style-black-000000.svg

Setup

$ mkvirtualenv -a $(pwd) ikeascraper
$ pip install -r requirements.txt
$ python manage.py makemigrations 
$ pythonmanage.py migrate

The webdriver uses headless Chrome. It looks for chromedriver at /usr/local/bin.

Usage

This project uses a custom crawl Django command that crawls the IKEA site with Selenium.

The script does three additional things:

  1. Prints the scraped data to stdout
  2. Saves the scraped data to the database
  3. Saves the scraped data to a JSON file

You can provide a filename when you run the command. Otherwise, the file will be items.json

$ workon ikeascraper
$ python manage.py crawl [filename]
$ python manage.py crawl sofas.json

Cleaning database...
Scraping items...
 11%|███████████████████▌                                                                                                                                                            | 1/9 [00:14<01:54,  0.07it/s]

[
 ...
 {'colors': [],
  'imageUrl': 'https://www.ikea.com/es/es/images/products/kivik-chaise-longue-hillared-anthracite__0479950_PE619104_S5.JPG?f=xs',
  'name': 'KIVIK',
  'type': 'Chaiselongue'}]
Dumped 428 items to sofas.json

The source code of the crawl command is here.