/Tableau_CitiBIke

Using Tableau, this project analyses CitiBike's July 2023 data to understand user behaviour (Python is employed for ETL processes). Objectives include user segmentation, popular stations, ride characteristics, and geographical concentration. Key findings are summarised in two dashboards for member and casual users.

Primary LanguageJupyter Notebook

Tableau - CitiBike July 2023 Analysis

Project Description & Background- Tableau-Dashboard

Background:

Since 2013, the Citi Bike program has implemented robust infrastructure for collecting data on the program's utilisation. Each month, bike data is collected, organised, and made public on the Citi Bike Data.

However, while the data has been regularly updated, the team has yet to implement a dashboard or sophisticated reporting process. City officials have questions about the program, so your first task on the job is to build a set of data reports to provide the answers.

Project Description:

Analysing CitiBike's warmest month (July) from 2023 to provide insights and explore the following questions:

  1. Do members or casuals have higher usage?
  2. Which stations are most popular?
  3. What is the overall average distance travelled?
  4. What days of the week are most rides taken on?
  5. What type of bicycle is used most?
  6. On average, how long do users rent a bicycle?
  7. Which zip codes have largest concentration of usage (approx.)?

To answer these questions, two dashboards were created for each company segment: members and casual users. Members are users that subscribed for an annual membership (Citi Bike plan / Lyft Pink plan pricing); Casual members are users who purchased a 24-hour pass OR 3-day pass.

Description of the data:

There are 13 columns and 3,767,347 data records in July CitiBike.

Columns:
# 1. ride_id               
# 2. rideable_type         
# 3. started_at            
# 4. ended_at              
# 5. start_station_name    
# 6. start_station_id      
# 7. end_station_name      
# 8. end_station_id        
# 9. start_lat             
# 10. start_lng             
# 11. end_lat   ---> The lat and lng of the endpoint for a given ride.            
# 12. end_lng               
# 13. member_casual  --> Segmentation column - identifying members and casual users         
# 14. distance   ---> Defining a function to return the distance between two geolocation points given a sphere - Haversine formula

Assumption & Note:

Assumption:

  • Citibike's July data includes multiple geolocations per station. The assumption here is that each station has a "static" geolocation as well as "dynamic" geolocations for each bicycle docked in the station (each bicycle has a "tablet" that is docked to the steering wheel). Note:
  • What is the distance metric? The dataset does not include multiple geolocations to indicate the root of a given ride. Haversine's formula allows us to calculate the distance between two stations. This data is presented in the dashboard to calculate the average distance for members and casual users.
Members Dashboard

members_dashboard

Casual Members Dashboard

casual_members_dashboard

Tableau Story

Main

ETL

Extract:

  • Data Extraction: Downloading the zip file from CitiBike's data source (202307-citibike-tripdata.csv.zip).
  • Rendering the data extracted from the zip file to Jupyter Notebook using Pandas.
  • Extracting the cleaned data using the zipfile python library using compression level 9.

Transform:

  • Removing null values.
  • Memory optimisation -> Transforming the data types and reducing the bite size for each dtype.
  • Manipulation -> Leveraging Harvesine's function to calculate the distance between two geolocations (adding the distance column to the dataset). Download the clean data here -> Cleaned Data - Drop Box

Load:

  • Load the data (CSV) into Tableau, analyse the data, and upload the visuals to the dashboards.

Python Libraries Used:

  1. Pandas.
  2. os.
  3. math --> radians, sin, cos, sqrt, atan2
  4. zipfile

Folder structure

.
│   ├── Images 
│   |   ├── CitiBike Logo.png      
│   |   ├── CitiBike_Bike.png          
│   |   ├── customers.png         
│   |   ├── Distance.png        
│   |   ├── docked_bike.png
│   |   ├── Ranting.png       
│   ├── DataTransformation.ipynb     
│   ├── Dashboard_Images
│   |   ├── Casual_Dashboard.png      
│   |   ├── Members_Dashboard.png     
|___README.md
|___.gitignore