Final project for SI 564 Winter 2020
I created a database and loaded it with (some of) the 2014-15 US Department of Education College Scorecard data. This data was sourced from the US Department of Education on March 7, 2020. I used the data in the database to answer questions using SQL queries.
In addition to requirements.txt
and .gitignore
, this repo includes:
run.py
: program to run to create and load the databasecreate_db.py
: functions to create the database and tablesload_db.py
: functions to get the data, clean it, and load it into the databasevars.py
: global variables to be used in the above programsQUESTIONS.md
: the questions I posed, SQL queries used to answer them, and the resultsERD.png
: entity relationship diagram of the database and tablescollege_scorecard.sql
: a sample of the database created and loaded usingrun.py
Data Dictionary.pdf
: data dictionary for each table (see name of table in upper left of page)
-
Clone this repo to your computer by using the command line to navigate to the directory/folder where you want it and entering
git clone https://github.com/mfldavidson/college_scorecard_db.git
. -
Create a virtual environment (
python3 -m venv whateveryouwanttonameit
) wherever you keep your virtual environments. -
Activate the virtual environment (
source whateveryounamedthevirtualenv/bin/activate
if you are on a Mac, orsource whateveryounamedthevirtualenv/Scripts/activate
if you are on a PC). -
Install all necessary libraries by navigating to the repo and then running the command
pip install -r requirements.txt
. -
Ensure your environmental variables are set with
username
andpassword
corresponding to the MySQL Server in which you want to create the database (must have read-write access).
-
Ensure your virtual environment is activated--if not, see step 3 above.
-
Ensure you are in the
college_scorecard_db
directory in your command line. -
Enter
python run.py
in your command line.