Insights and Conclusions
-
There seems to be no NULL/NA values in the dataset.
-
The number of ridership has been increasing year on year.
-
Majority of the temperature ranges from 8 - 30 degree Celsius
-
Humidity ranges from 20 - 100
-
Majority of the days the ridership has been 0. For both the casual as well as registered users.
-
The majority of data includes clear weather days.
-
The data for different seasons is equally distributed.
-
The outlier percentage for ridership in different seasons as well as different weather conditions remains below 5 % of total data
-
Also for holiday and working days the ridership has very low outliers.
-
The correlation heatmap shows a 0.98 correlation between the "temp" and "atemp" column. So we remove the "atemp" column.
-
The count has the sum of registered + casual users so we keep all the columns.
-
The count of casual users on non-working days is significantly higher than on working days.
-
The count of registered users on working days is significantly higher than on non-working days.
-
In different weather conditions the days with 0 ridership are higher.
-
The data for count is not normally distributed. ( Concluded by QQ-Plot, Skewness, Kurtosis, Shapiro wilks test, Levine's test. Therefore we cannot use the ANOVA, so we use the KS-Test.
-
This gives us a result that there is a significant difference between ridership count of different weather conditions. Where clear weather has the highest ridership and light rain has the lowest ridership.
-
In different seasons the days with 0 ridership are higher.
-
The data for different seasons' ridership count is not normally distributed. ( Concluded by QQ-Plot, Skewness, Kurtosis, Shapiro wilks test, Levine's test). Therefore we cannot use the ANOVA, so we use the KS-Test.
-
This gives us a result that there is a significant difference between ridership count of different seasons. Where the fall season has the highest ridership and spring has the lowest ridership.
-
The weather conditions and seasons are significantly associated.
-
There is a positive correlation between temperature and count of riders.
-
There is a negative correlation among the count of riders and humidity.
Recommendations
-
The company should analyze why there are many days with 0 riders.
-
There is a trend where the casual ridership increases on weekends and registered ridership increases on working days. The company should focus on how they can target the casual riders on weekends to make them registered riders.
-
The company can hook up registered riders by providing them with weekly/monthly passes at discounted prices. This will ensure that the days with 0 ridership are lower.
-
The company can also provide coupons for casual users on weekends for increasing sales.
-
In total there are many days with 0 ridership as well as large ridership. The Company should analyze the data when ridership spikes suddenly and do the same for less ridership days.
-
As the temperature increases so does the ridership, the company can predict weather conditions and market their product according to the temperature predictions.
-
The ridership remains steady for a certain humidity level than reduces. The company can target such geographical locations where the humidity is OK and the average temperature is around 30 degree Celsius. This will help boost the business sales.
-
For clear weather conditions the ridership is highest while for the light rain weather the ridership is lowest. The Company can predict weather conditions for marketing products.
-
Same goes for seasons where the people prefer bicycles in the Fall season and do not prefer in the spring season.
-
As the weather and season are related the company can target to provide discounts and market its product well in the season of fall and clear sunny days.
-
Overall Clear sunny days with average temperature above 25 degree Celsius in the season of Fall are the best days for the company to make revenue