This is a web scraper for retrieving UCI course information from the UCI University Registrar. This is a tool I built for the UCI Course API.
Use this scraper to grab course information and import it into a PostgreSQL database
- Scraper is hosted on Heroku
- Executes the department spider to grab updated list of departments
- Executes a course spider for each department in department list
- Uploads all the information to the AWS RDS PostgreSQL database
- PostgreSQL
From within the root directory:
pip install -r requirements.txt
Start up PostgreSQL server with correct relations setup
// To crawl courses into database
scrapy crawl course_scrapy
// To crawl courses into database and store them into courses.json
scrapy crawl course_scrapy -o courses.json
- Change items.py
- Change the way course_spider.py parses
- Change the models.py to reflect database schema
- Change pipelines.py to manage the insertion of new data
View the project roadmap here
See CONTRIBUTING.md for contribution guidelines.