This repo can be used to produce a U.S. college recommendation web app. The data preprocessing steps are written with R
and the app is written with Python 3
. The app is currently running at http://ec2-52-91-59-235.compute-1.amazonaws.com:5000/home.
- Developer: Junxiong Liu
- Product Owner: Zili Li
- QA: Chris Rozolis
Create a web app to help high school students and parents make well-informed decisions in the college application process based on their preference (e.g. location, school size) and background (e.g. SAT).
- Vision: Help high school students and parents make well-informed decisions in the college application process.
- Mission: Create an interactive web app that is based on college information data to help applicants and their family better decide which colleges to apply for and to attend.
- Success Criteria: Track new user engagement and interaction of the web app throughout the time.
The raw data is from Kaggle. I used R
to do some EDA and clean the raw data (code in develop/data_cleaning/data_cleaning.Rmd
). Alternatively, you can download the cleaned data from my Google Drive.
Things you need to get it started:
- conda: Either Anaconda or Miniconda is fine for this project.
- git: You will most likely need version control.
Below is a brief tutorial to set up the app in a AWS EC2 or Linux. For other systems, the general steps should be the same, but small changes might be needed.
-
Update. Install git and conda if you have not done so.
sudo yum update
sudo yum install git
wget https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-x86_64.sh bash Anaconda3-5.1.0-Linux-x86_64.sh
-
Clone this GitHub repository to local. Go into the directory, and use the
collegeapp.yml
file to create a conda environment with all required packages and dependecies.conda env create -f collegeapp.yml
Then, activate the conda environment by entering
source activate collegeapp
. -
In the same directory as
collegeapp.yml
, create a file calledconfig
and paste the following information into the file to configure AWS RDS access.SECRET_KEY = 'development_key' SQLALCHEMY_DATABASE_URI = 'postgresql://collegeconnect:collegeahead@msiawebapp.cg96n7rbldvk.us-east-1.rds.amazonaws.com:5432/msiawebappdb' SQLALCHEMY_TRACK_MODIFICATIONS = True
-
app/__init__.py
should have included the following line of code:application.config.from_envvar('APP_SETTINGS', silent=True)
which tells the application to look at the environmental variable
APP_SETTINGS
for the path to your config file. This means you simply need to set this environmental variable by entering:export APP_SETTINGS="path/to/where/your/config/file/is.config
-
The database has been initialized, so you may skip this step. If this is not the case, please initialize a folder called
data
indevelop
and store the cleaned data (Google Drive) into this new folder. Then, you should enterpython create_collegedb.py
to initialize the database. -
Now enter
python application.py
. The app should be running onyour EC2 public DNS
+:5000/home
. Have fun!
There are two sets of logging performed.
-
application.log
stores the logs of any user interaction with the EC2 application. -
createdb.log
stores the logs of database initialization.
We performed unit testing for develop/modeling/model.py
file. The functions we tested are:
filter()
modeling()
major_pref_transformation()