Setup

$ mkvirtualenv -a $(pwd) ikeascraper
$ pip install -r requirements.txt
$ python manage.py makemigrations 
$ pythonmanage.py migrate

The webdriver uses headless Chrome. It looks for chromedriver at /usr/local/bin.

Usage

This project uses a custom crawl Django command that crawls the IKEA site with Selenium.

The script does three additional things:

Prints the scraped data to stdout
Saves the scraped data to the database
Saves the scraped data to a JSON file

You can provide a filename when you run the command. Otherwise, the file will be items.json

$ workon ikeascraper
$ python manage.py crawl [filename]

$ python manage.py crawl sofas.json

Cleaning database...
Scraping items...
 11%|███████████████████▌                                                                                                                                                            | 1/9 [00:14<01:54,  0.07it/s]

[
 ...
 {'colors': [],
  'imageUrl': 'https://www.ikea.com/es/es/images/products/kivik-chaise-longue-hillared-anthracite__0479950_PE619104_S5.JPG?f=xs',
  'name': 'KIVIK',
  'type': 'Chaiselongue'}]
Dumped 428 items to sofas.json

The source code of the crawl command is here.

WorkShoft/ikeascraper

Setup

Usage