/Cyclistic-Case-Study

Case study under capstone project of Google Data Analytics Certificate

Primary LanguageSQL

Table of Contents

  1. Introduction
  2. Business Task
  3. Data
  4. Processing and Cleaning
  5. Analysis and Viz
  6. Conclusion and Recommendations

Introduction

The project is a part of Google Data Analytics Certification course capstone. The scenario involves analysis of the trip data of Cyclistic bike share company.

The company has two models for availing service: individual passes which are called "casual" riders and annual subscriptions called "member" riders. The company operates in Chicago with around 6000 bicycles at 700 stations.

Maximizing the number of annual members will be key to future growth as it ensures financial sustainability and customer retention. The insights can help devise effective marketing strategies aimed to convert more casual riders into annual members.

Business Task

How do annual members and casual riders use Cyclistic bikes differently ?

Objective : To clean, analyze and visualize the data to observe how casual riders use the bike rentals differently from annual member riders.

Data

  • Data source : Public data from Motivate International Inc. (Divvy Bicycle Sharing Service from Chicago) under this license.
  • Cyclistic’s historical trip data (2013 onwards) available in .csv format.
  • Our date range : May 2020 to April 2021 (608 MB data)
  • The dataset has individual ride records consisting of ride start-end date & time, station information, bike type, rider type (casual/member).
  • Data uploaded to Google Cloud Storage(GCS) in order to import the large files.

Processing and Cleaning

  • Data imported from GCS into BigQuery for manipulation and analysis using SQL.
  • Visualizations to be developed in Google Data Studio.
  • Datatypes made consistent and then consolidated into one view using this query.
  • To assist in analysis, 4 new columns were added (start point location, end point location, ride start day name and ride duration in seconds).
  • 3,742,624 rows were returned but required cleaning.
  • Cleaning process :
    • Missing start and end station names found using this query.
    • Other columns checked using this query.
    • Negative and zero ride duration values found using this query.

Following the cleaning and consolidating data in one table, 3,476,354 rows were returned for proceeding to analysis. All of this was achieved using this single master query. JOIN, WITH, UNION ALL, WHERE, subqueries and many other SQL functions were used here.

Analysis and Viz

The final dataset containing trip data of roughly 3.4 million ride records was analyzed. Visualizations were developed in Google Data Studio to observe differential trends between the usage by casual riders and annual members.

Total ride share

piechart

Insights

  • 58.6% of total rides (3.4M) were taken by annual members.
  • 41.4% of total rides were taken by casual riders.
  • Annual members form the majority of business for the company and maximizing on this number should be the focus in the long run.

Weekly distribution of number of rides

line_chart_1

Insights

  • Clearly, the rides taken by casual riders peak throughout the weekend as compared to that of annual members which remains relatively flat.
  • About 50% less casual riders use the rentals during weekdays as compared to weekends.
  • This indicates that casual riders use the bike rentals for leisure purposes and not for commuting.

Weekly distribution of average ride duration

line_chart_2

Insights

  • The average ride duration of casual members is about 3 times that of annual members.
  • The average ride duration both type of riders increase on weekends.
  • Again, this indicates that casual riders use the bike rentals for leisure purposes.

Ride duration vs Ride distance

bars

Insights

  • The plots clearly show the contrast between average ride duration and average ride distance for both user types.
  • While both user types ride a similar average distance, casual riders ride for 3x longer duration as compared to annual members.

Hourly distribution of number of rides

bar_hour

Insights

  • The proportion of casual riders increases in the non-commuting hours i.e. in forenoon hours and after 8pm from 18% of total riders to 50% of total riders.
  • Annual members take the major chunk of the rides during peak-travel hours in the morning and evening to upto 82% of total riders.
  • Again, this indicates that casual riders use the bike rentals for leisure purposes while annual members use it for commuting.

Monthly distribution of number of rides - Seasonality

bar_season

Insights

  • The proportion of casual riders falls drastically during winter months(Dec-Feb) to only about 20% of total riders.
  • The proportion of casual riders goes maximum in the months of June, July, August and September to upto 40% of total riders.

Conclusion and Recommendations

  • A common observation is that casual riders are using the bike rentals for leisure and tourism purposes while annual members use it predominantly for commuting purposes.
  • Targetted on-ground marketing strategies should be devised at places of leisure like parks, theatres, restaurants and cafes.
  • Discounting campaigns for casual riders on weekdays can motivate them to use the service for commuting.
  • Tools like push notifications can be used to attract casual riders during the lean periods of the day.
  • A campaign for the winter months, maybe clubbed with holidays or Christmas can help pick up the numbers during those months.