/Uber-Data-Analysis

Uber Data Analysis using R

Primary LanguageR

Uber Data Analysis using R

  • With over 118 million users, 5 million drivers, and 6.3 billion trips with 17.4 million trips completed per day - Uber is the company behind the data for moving people and making deliveries hassle-free.

  • How are drivers assigned to riders cost-efficiently, and how is dynamic pricing leveraged to balance supply and demand?

  • Thanks to the large volumes of data Uber collects and the fantastic team that handles Uber Data Analysis using Machine Learning tools and frameworks.

  • If you’re curious to learn more about how data analysis is done at Uber to ensure positive experiences for riders while making the ride profitable for the company - Get your hands dirty working with the Uber dataset to gain in-depth insights.

  • Data storytelling is an important component of Machine Learning through which companies are able to understand the background of various operations.

  • With the help of visualization, companies can avail the benefit of understanding complex data and gaining insights that would help them to craft decisions.

  • This is more of a data visualization project that will guide you towards using the ggplot2 library for understanding the data and for developing an intuition for understanding the customers who avail the trips.

  • In this data analysis, we analyze Uber data from 1th April 2014 to 30th September 2014.

  • The goal of this project is to learn visualizations in R.

  • Dataset: Kaggle

We will import the essential packages that we will use in this uber data analysis project:

ggplot2

This is the backbone of this project. ggplot2 is the most popular data visualization library that is most widely used for creating aesthetic visualization plots.

ggthemes

This is more of an add-on to our main ggplot2 library. With this, we can create better create extra themes and scales with the mainstream ggplot2 package.

lubridate

Our dataset involves various time frames. In order to understand our data in separate time categories, we will make use of the lubridate package.

dplyr

This package is the lingua franca of data manipulation in R.

tidyr

This package will help you to tidy your data. The basic principle of tidyr is to tidy the columns where each variable is present in a column, each observation is represented by a row and each value depicts a cell.

DT

With the help of this package, we will be able to interface with the JavaScript Library called – Datatables.

scales

With the help of graphical scales, we can automatically map the data to the correct scales with well-placed axes and legends.