/IU-Scrapy

Scraping IU pics

Primary LanguagePython

IU Python Scraper!

Introduction

A script written in python using Scrapy that downloads all images from http://iustudio.net/

Why Scrapy?

Scrapy is a crawler. Beautiful Soup is pretty good for standard parsing, but only for contents in the url you provide — which isn't as robust as Scrapy.

Getting Started

Follow the installation guide below if you want to mess around with the code. Otherwise you could just grab the image_downloads file if you want IU photos exclusively.

Installation

  1. For dependecies, I recommend downloading with conda see documentation https://docs.scrapy.org/en/latest/intro/install.html. The following code explains how to create a virtual environment and install dependecies to run this spider
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
scrapy crawl paris
  1. Clone the repo
git clone https://github.com/kingsotn/IU-Scrapy.git
  1. Run inside the directory
scrapy crawl iu

Demo

Tips

  • There are no duplicates, this is accounted for in the Scrapy pipelines
  • I suggest not modifying (deleting or adding) to the image_downloads folder.
  • There are no duplicates when running the file multiple times, this is accounted for in the Scrapy pipelines
  • If you want to modify your image stash, I suggest copying the image_downloads file and modifying somewhere else on your computer
  • currently everything is stored in jpg files, ping me if you want other file support

For more examples, please refer to the Scrapy Documentation

License

Distributed under the MIT License. See LICENSE.txt for more information.