This project evaluates current bike availability trends during a 24 hours period and seeks to predict bike availability for the NYC Citibike bike share system. The three main sections of this project include: explores the clustering Technique to sort the maximum bike availability across hours, and four other versions of machine learned models: Logistic Regression Model, Random Forest Classifier, Linear SVC Model and KNeighbors Classifier, to determine which model best fits the dataset . Comparative evaluation of the models indicates that Random Forest and the SVC models out-performed Logistic Regression and K-Neighbors models.
Project is created with:
- Python language for scripting.
- SQLAchemy
- Postgres SQL Database for storage in the backend.
- AWS to read / write intermediate data coming in from the API calls.
- Python Libraries Used : Sklearn , Pandas, Numpy Matplotlib and Seaborn.