BigData-Cycle-Share-DataSet-Analysis

The Cycle-share dataset Analysis using BigData concepts - Hadoop MapReduce Framework, Hive, Pig, MapReduce Design Patterns

Dataset source

The dataset was downloaded from Kaggle - https://www.kaggle.com/pronto/cycle-share-dataset.

Analyses performed

  • Analysis 1 - Number of trips by month-year
  • Analysis 2 - Min, Max and Average duration of trips from each station
  • Analysis 3 - Total number of trips per station by year MapReduce
  • Analysis 4 - Top 5 busy stations by month
  • Analysis 5 - Most active age groups
  • Analysis 6 - Number of trips in a day from each station and the corresponding weather on that day.
  • Analysis 7 - Custom MapReduce algorithm to find the top 10 most busy routes
  • Analysis 8 - Count membership by gender MapReduce
  • Analysis 9 - Count of all the trips by station
  • Analysis 10 - Top 5 busiest hours of the day
  • Analysis 11 - Total number of trips that lasted more than 30mins (1800sec) in each station
  • Analysis 12 - Number of trips in a day from each station and the corresponding weather on that day - joins patterns