virgodarth/image-spider

PythonGPL-3.0

Description

Extract and download images from websites by input keywords

Available website

unsplash_com

Requirements

python >= 3.7
Ubuntu >= 18.04

How to setup system

Install python

Start by updating the packages list and installing the prerequisites

sudo apt update
sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libsqlite3-dev libreadline-dev libffi-dev wget libbz2-dev
sudo apt install software-properties-common

Add the deadsnakes PPA to your sources list (used to install python3.8 or later)

sudo add-apt-repository ppa:deadsnakes/ppa
When prompted press Enter to continue: Press [ENTER] to continue or Ctrl-c to cancel adding it.

Once the repository is enabled, install python 3.8

sudo apt install python3.8

Set up enviroment

Install python virtualenv

sudo apt install python3.8-dev python3.8-env

Create new virtualenv

python3.8 -m venv your_folder_name

Active enviroment

source your_folder_name/bin/active

Deactive enviroment (if need)

deactivate

Install necessary python package

pip install -U pip wheel setuptools
pip install -r requirements/dev.txt

Run Code

Move to workdir

cd ./spider_app

Setup settings.py for scrapy

cp

Show available spiders

crapy list

choose and run spider

scrapy crawl -a tags=your_keywords_are_seperated_by_comma your_selected_spider
Ex: scrapy crawl -a tags=flower,friend,babay unsplash_spider

Default Config

Watch log file: tail -f -n 100 ./spider_app/logs
Download folder: ls ./spider_app/spider_app/download/
Total downloaded images: find ./spider_app/download -type f | wc -l