Webreg Scrapy

This is a web scraper for retrieving UCI course information from the UCI University Registrar. This is a tool I built for the UCI Course API.

Usage
1. Process
Requirements
Development
Contributing

Usage

Use this scraper to grab course information and import it into a PostgreSQL database

Process

Scraper is hosted on Heroku
Executes the department spider to grab updated list of departments
Executes a course spider for each department in department list
Uploads all the information to the AWS RDS PostgreSQL database

Requirements

PostgreSQL

Development

Installing Dependencies

From within the root directory:

pip install -r requirements.txt

Running the Scraper

Start up PostgreSQL server with correct relations setup

// To crawl courses into database
scrapy crawl course_scrapy  
// To crawl courses into database and store them into courses.json
scrapy crawl course_scrapy -o courses.json

Handling UCI Data Changes

Change items.py
Change the way course_spider.py parses
Change the models.py to reflect database schema
Change pipelines.py to manage the insertion of new data

Roadmap

View the project roadmap here

Contributing

See CONTRIBUTING.md for contribution guidelines.

djchie/webreg_scrapy

Webreg Scrapy

Table of Contents

Usage

Process

Requirements

Development

Installing Dependencies

Running the Scraper

Handling UCI Data Changes

Roadmap

Contributing