IU Python Scraper!

Introduction

A script written in python using Scrapy that downloads all images from http://iustudio.net/

Why Scrapy?

Scrapy is a crawler. Beautiful Soup is pretty good for standard parsing, but only for contents in the url you provide — which isn't as robust as Scrapy.

Getting Started

Follow the installation guide below if you want to mess around with the code. Otherwise you could just grab the image_downloads file if you want IU photos exclusively.

Installation

For dependecies, I recommend downloading with conda see documentation https://docs.scrapy.org/en/latest/intro/install.html. The following code explains how to create a virtual environment and install dependecies to run this spider

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
scrapy crawl paris

Clone the repo

git clone https://github.com/kingsotn/IU-Scrapy.git

Run inside the directory

scrapy crawl iu

Demo

Tips

There are no duplicates, this is accounted for in the Scrapy pipelines
I suggest not modifying (deleting or adding) to the image_downloads folder.
There are no duplicates when running the file multiple times, this is accounted for in the Scrapy pipelines
If you want to modify your image stash, I suggest copying the image_downloads file and modifying somewhere else on your computer
currently everything is stored in jpg files, ping me if you want other file support

For more examples, please refer to the Scrapy Documentation

License

Distributed under the MIT License. See LICENSE.txt for more information.