Outline a brief description of your project.
- General Info
- [Technologies Used](#Pandas, Seaborn,Matplotlib, Scikitlearn, Statsmodels)
- Conclusions
-
Project general information This project aims to leverage the mathematical concept of linear regression to solve a realworld problem faced by a US based bike sharing firm named BoomBikes.
-
What is the background of your project? Bombikes had noticed a significant dip in the count of daily bike bookings due to COVID pandemic, demand is anticipated to boom post pandemic, BoomBikes aims to be ready to handle this growth by understanding what factors are the majour growth drivers to the business.
-
What is the business probem that your project is trying to solve? The project aims to discover and predict the relationships among various independent variables that are recorded by the company that are linked to the count of daily rides booked on a bike sharing platform. By understand this BoomBikes aims to achieve rapid growth when pandemic ends.
-
What is the dataset that is being used? The data set is from BoomBikes that have various variables that are related to their daily bookings count number.
After data analysis, preparation, modeling, and testing, we found that:
Our model fits the data well, with good R-squared values Our model predicts the variance accurately, with near-zero mean squared error for both sets. We used p-values, VIF, and RFE to select the best variables. Bike demand depends on temperature, workingday, season, month, day, and holiday. Winter has more rentals than summer and spring. September and October are popular months. Wed, Thurs, and Sat are busy days. We suggest more marketing in summer and spring, and more incentives on cloudy days. We need to retain repeat customers as rentals increased from 2018 to 2019. Summary: We did data processing, modeling, and evaluation, and discovered that:
Our model explains the data well, with R-squared values of 82.71% and 81.13% for train and test sets respectively. Our model predicts the variance well, with almost zero mean squared error for both sets. We used p-values, VIF, and RFE to choose the best variables. Bike demand varies with temperature, workingday, season, month, day, and holiday. Winter has more rentals than summer and spring. September and October are high-demand months. Wed, Thurs, and Sat are peak days. We recommend more promotion in summer and spring, and more deals on cloudy days. We need to keep repeat customers as rentals grew from 2018 to 2019.
- Pandas
- Seaborn
- Matplotlib
- Scikitlearn
- StatsModels
Credits.
Created by [@abhishek-92a] - feel free to contact me!