You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
To complete my analysis, I will be following the steps of the data analysis process: ask, prepare, process, analyze, share, and act.
Analyze Cyclistic trip data to understand how casual riders and annual members use Cyclistic bikes differently. Provide these insights to the marketing team to help develop a strategy to convert more casual riders into annual members.
Public data provided by Divvy Bikes and is available here.
- Downloaded 12 months of data ranging from 2022/03 - 2023/02.
- Converted the .csv files to .xlxs to begin data cleanup in Microsoft Excel.
- Filtred all columns to search for and remove rows with missing or corrupt data.
- Created a new column labelled
ride_length
that would calculate the time each ride took by subtracting thestarted_at
column from theended_at
column and formatted the time as HH:MM:SS. - Created another column labelled
day_of_week
and used the WEEKDAY function to extract what day of the week each ride started on from thestarted_at
column.
Uploaded my data to BigQuery to perform analysis using SQL.
Ran the query below to combine all of my individual monthly tables into one and saved that resulting table as combined_tripdata
.
/* Created one large table that contains all of the data from 12 individual tables using FULL JOIN.
Each table shared the same columns and data types */
SELECT *
FROM
`tripdata-project.tripdata.2022_03_tripdata`
FULL JOIN
`tripdata-project.tripdata.2022_04_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_05_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
tart_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_06_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_07_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_08_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_09_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_10_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_11_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2022_12_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2023_01_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
FULL JOIN
`tripdata-project.tripdata.2023_02_tripdata`
USING
(ride_id, started_at, rideable_type, ended_at, start_station_name,
start_station_id, member_casual, ride_length, day_of_week)
ORDER BY
started_at ASC
Then I ran my next query on the combined_tripdata
table to find the average minutes that each type of user rode each type of bike. Saved the query results as a new table named avg_minutes
.
/* ride_length is typed as a time expression so I had to extract each part of the HH:MM:SS into seperate numeric
variables, find the average of each part, round that average, and then concatonate each part back into
the HH:MM:SS format */
SELECT
rideable_type,
member_casual AS user_type,
CONCAT(ROUND(AVG(EXTRACT(HOUR FROM ride_length))), ":",
ROUND(AVG(EXTRACT(MINUTE FROM ride_length))), ":",
ROUND(AVG(EXTRACT(SECOND FROM ride_length)))) AS avg_ride_length,
--created the avg_minutes column to get minutes as a FLOAT datatype
ROUND(AVG(EXTRACT(MINUTE FROM ride_length))) AS avg_minutes
FROM
`tripdata-project.tripdata.combined_tripdata`
-- grouped by both rideable_type and member_casual to find the average time for each category
GROUP BY
rideable_type,
member_casual
avg_minutes
Table
rideable_type | user_type | avg_ride_length | avg_minutes |
---|---|---|---|
electric_bike | casual | 0:13:29 | 13.0 |
electric_bike | member | 0:10:29 | 10.0 |
classic_bike | member | 0:12:29 | 12.0 |
classic_bike | casual | 0:17:29 | 17.0 |
docked_bike | casual | 0:24:29 | 24.0 |
Proceeded to run another query on the combined_tripdata
table to find the average amount of minutes users rode bikes on each day of the week. Saved the results as a new table named avg_minutes_per_day
.
/* ride_length is typed as a time expression so I had to extract each part of the HH:MM:SS into seperate numeric
variables, find the average of each part, round that average, and then concatonate each part back into
the HH:MM:SS format */
SELECT
CONCAT(ROUND(AVG(EXTRACT(HOUR FROM ride_length))), ":",
ROUND(AVG(EXTRACT(MINUTE FROM ride_length))), ":",
ROUND(AVG(EXTRACT(SECOND FROM ride_length)))) AS avg_ride_length,
--created the avg_minutes column to get minutes as a FLOAT datatype
ROUND(AVG(EXTRACT(MINUTE FROM ride_length))) AS avg_minutes,
day_of_week
FROM
`tripdata-project.tripdata.combined_tripdata`
-- grouped by day_of_week to find the avg_ride_length for each day of the week
GROUP BY
day_of_week
avg_minutes_per_day
Table
avg_ride_length | avg_minutes | day_of_week |
---|---|---|
0:12:29 | 12.0 | Tuesday |
0:12:29 | 12.0 | Wednesday |
0:12:29 | 12.0 | Thursday |
0:13:29 | 13.0 | Friday |
0:15:29 | 15.0 | Saturday |
0:15:29 | 15.0 | Sunday |
0:12:29 | 12.0 | Monday |
To share my results, I created two visualizations by exporting my avg_minutes
and avg_minutes_per_day
tables and uploading them to Tableau Public.
- The average ride time of both the classic and electrical bike types is higher for casual riders than annual members.
- Both casual riders and annual members have a higher ride time on classic bikes.
- The average ride time is much higher on the weekends than the weekdays.
- The average ride time is consistant thoughout both weekdays and weekends with the only outlier being Friday, which averages higher than the rest of the weekdays.
- Casual riders tend to ride for more minutes on average so Cyclistic could advertise how much these casual riders would be able to save if they joined the annual membership. After the casual rider finishes their trip, the app could compare their current rates with the rates of annual members and offer a retroactive discount if they sign up immediately.
- Riders use Cyclistic bikes for longer on the weekends. Cyclistic could offer a limited time deal to new annual members that offers free or greatly discounted weekends.
- Classic bikes have a higher average use time than electric bikes for casual riders. Cyclistic could advertise a discounted rate for annual members who ride classic bikes.