/nyc-homeless-students

Final project for Messy Data and Machine Learning at NYU Steinhardt, Spring 2020. Project partners: Hope Muller and Heidi Choi

Primary LanguageR

The purpose of the project was to build prediction models for homeless student high school graduation rates in NYC public schools. Using longitidunal data, random forest models were used to identify key variables or features to best predict if an individual was at risk of not graduating high school in time. A multi-level LASSO model was used with school level data to cross-validate the features identified by the random forest models.

Student-level data used for this project was provided by the Research Alliance for NYC Schools (RA). The data were housed on a RA server and the scripts were used in conjunction with SSHFS drive mapping to ensure data never left the server, as per RA's agreement with the NYC Department of Education.