This repository is for processing graduate classes from Summer 2013 to Fall 2022 for AntCatalog. Note that classes from certain departments like Law are omitted due to efforts needed to parse the data.
- Python v3.10
- openpyxl
- beautifulsoup4
- lxml
- SQLite
cd
intosrc
and follow https://www.ics.uci.edu/~thornton/ics32/Notes/ThirdPartyLibraries/ to set up a virtual environment- Activate the virtual environment
- Linux
cd Scripts
source activate
- Windows
cd Scripts
activate
- Linux
pip -r requirements.txt
- Create a
temp
folder in the project root directory - Take the data in the
original_data
folder, separate them based on academic years, and put them into thetemp
folder- Check out the
processed_data
folder to have an idea on how the spreadsheets in thetemp
folder will look like (or check below)
- Check out the
- Run the parsing script (
python clean_data.py
orpython3 clean_data.py
) if you want- List the files you want to process in the
SPREADSHEET_FILES
line underclean_data.py
- The result will show up on the
processed_data
folder- Courses that cannot be processed will have "F" in the
Processed
column - The reason is recorded in
src/log.txt
- Courses that cannot be processed will have "F" in the
- Remember to change the name of the file in
clean_data.py
- List the files you want to process in the
- Run the database script (
python create_db.py
orpython3 create_db.py
) if you want
AcadYr | AcadTerm | DepartmentNameByCourseCode | CourseNumber | CourseCode | CourseTitle | Instructors | GradeACount | GradeBCount | GradeCCount | GradeDCount | GradeFCount | GradePCount | GradeNPCount | GPAAvg | Processed |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
original_data
- Spreadsheets from PRO (Public Records Office) in UC Irvine
processed_data
- Parsed version based on data from WebSOC services
src
- A program to fetch the data from WebSOC services to clean the data from PRO (
clean_data.py
) - A program to create a SQLite database from the parsed data (
create_db.py
)
- A program to fetch the data from WebSOC services to clean the data from PRO (
This project is made possible by UC Irvine's Public Records Office. But the inspiration has its root from the ZotCurve project.