This project aims to explore factors that contribute to the competitiveness, in terms of graduation rate and diversity, of schools in similar demographics and population centers. The roles of our clients are school administrators who need suggestions about how to make their schools more competitive, and the result from our project will give concrete suggestions about on which aspects should their school improve in order to achive their goal. Our data source of our analysis is College Scorecard and our analysis focused on schools within California.
Contents are separated into five main sections (with each section corresponding to a directory)
- Code -- this is where regression scripts & utility functions & unit tests locate.
- Data -- raw data & cleaned data & data created by regressions are saved in this directory.
- Images -- images created by exploratory analysis & regression are saved in this direcotry.
- Report -- contains a section sub-directory that includes all sections of the final report in separate files and final report is dynamically generated and saved here.
- Slides -- slides are dynamically generated and saved here.
And the file strucutre of this project is like the following
Stat159-Final-Project/
.gitignore
README.md
Makefile
LICENSE
report/
sections/
00-abstract.Rnw
01-introduction.Rnw
02-data.Rnw
03-methods.Rnw
04-analysis.Rnw
05-results.Rnw
06-conclusions.Rnw
report.Rnw
report.pdf
images/
... (dynamically generated images)
data/
cleaned-data/
... (clean data)
... (dynamically generated data)
code/
functions/
... (utility functions)
scripts/
... (regression scripts)
tests/
... (unit tests against utility functions)
shiny/
... (shiny app)
slides/
...
session-info.txt (system info)
This project can be reproduced by following the instructions below.
- Download/Clone this project from GitHub (unzip if downloaded file is in zip format)
- Open terminal or any shell program that supports standard linux commands
cd Stat159-Final-Project
make clean
to remove old compiled artifactmake
to generate new artifact- open
report.pdf
with PDF viewer of your choice - open
slides.html
with browser of your choice
This section includes description of different make commands that you can use to reproduce corresponding part of this project
make all
reproduce the entire projects -- download data, run regression analysis, aseemble report etcmake data
downlaod data from internet, run data cleaning script, split data into train set and test setmake eda
run exploratory data analysis scriptmake regressions
run all regression model scripts togethermake regressions-[diversity/grad-rate]
run regression models against diversity/grad-ratemake ols-[diversity/grad-rate]
run OLS regression script against diversity/grad-rate and save resultmake ridge-[diversity/grad-rate]
run Ridge regression script against diversity/grad-rate and save resultmake lasso-[diversity/grad-rate]
run Lasso regression script against diversity/grad-rate and save resultmake pcr-[diversity/grad-rate]
run PCR regression script against diversity/grad-rate and save resultmake plsr-[diversity/grad-rate]
run PLSR regression script against diversity/grad-rate and save resultmake session
run session info script and store system & package infromation into session-info.txtmake report
assemble report from Rmd files in sections and transform to PDF formatmake slides
create slides from Rmd filesmake clean
remove old artifactsmake tests
to run unit test in tests directorymake shiny
to run shiny app
Junyu Wang
Nichole Ann Rethmeier
Jie Sun
Mingtao Fang
ALl media content is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
All code is licensed under MIT license.