GoBike Dataset Exploration

by Abdullah Elsayed

Dataset

This dataset has information about trips for Ford FoBike service. All the trips have taken place in February 2019.

The dataset can be downloaded from here GoBike Dataset

Dataset attibutes

  • duration_sec: Trip Duration (seconds)
  • start_time: Start Time and Date
  • end_time: End Time and Date
  • start_station_id: Start Station ID
  • start_station_name: Start Station Name
  • start_station_latitude: Start Station Latitude
  • start_station_longitude: Start Station Longitude
  • end_station_id: End Station ID
  • end_station_name: End Station Name
  • end_station_latitude: End Station Latitude
  • end_station_longitude: End Station Longitude
  • bike_id: Bike ID
  • user_type: User Type (Subscriber or Customer)
  • member_birth_year: Member Year of Birth
  • member_gender: Member Gender
  • bike_share_for_all_trip: Boolean

Exploration goals

In our exploration we are going to focus on finding the factors the affect the number and duration of the rides. In the following list we stated the features that we will be using during our investigation.

  • duration_sec
  • start_time
  • end_time
  • start_station_name
  • end_station_name
  • user_type
  • member_birth_year
  • member_gender

Data cleaning

We have done the operations listed below inorder to clean our data

  • Drop unwanted columns
  • Drop missing values
  • Delete any duplicated rows
  • Convert data types
  • Inverstigate member_birht_year and drop invalid data values
  • Change duration_sec from seconds to minutes
  • Change member_birth_year to member_age
  • Add new column for age groups

Summary of Findings

Most of the users are subscribers

We found that almost 90 of the users are subscribers.

Big gap in genders

From our investigation on gender, we can clearly see that males have the biggest share of trips. This might indicate that marketing campainges do not target females. Also, it might indicate that females did not find the service good for them. We can solve this through survays to see what we can afford for them to make the service better.

GoBike is most likely used by workers

People tend to make trips more on working days. Also, we found that people make trips in the morning and evening. This indicate that people user the service to go to work and go back from work. Futhermore, from our investigation on age, we found that the service is populer amoung people with age from 20 to 50. This age is the age of working people. Therefore, from all these finding we can conclude that the service is very popular amoung working people and these findings might indicate that most people use it only go to work or go back from work to home. We can solve this issue by trying to target other groups in marketing campainges.

Age and trip duration

Our exploration on age did not indicate that there is any relarionship between age and trip duration. We found that people below 20 and people with age from 60 to 70 take longer trips than other age groups. All other groups take trips with almost the same duration.

Key Insights for Presentation

In the persentation we are going to create a facted graph for the percentage of ride made by the different genders and a box graph for the relationship between genders and duration of the trip. In our exploration these graphs are separated in different sections. Also, we are going to make a facted graph for the freq. by hour, freq. by day graphs, and Freq. by age group. These graphs are also separated in our exploration.