/Mapreduce-Analysis-on-NYPD-Motor-Vehicle-Collisions

Data Analysis on Motor Vehicle Collisions using Map Reduce

Primary LanguagePython

Mapreduce-Analysis-on-NYPD-Motor-Vehicle-Collisions

Data Analysis on Motor Vehicle Collisions using Map Reduce

This repository contains code to perform data analysis on NYPD Motor Vehicle Collisions dataset, using map reduce paradigm, that is provided by NYC Open Data. The dataset contains all the reports of vehicular incidents in New York City. More information about the dataset is available here. The dataset is updated regularly and has a lot of attributes. The aim of the project is to perform exploratory data analysis on the dataset and explore more about the map reduce paradigm. I have used hadoop streaming API and have written the mapper and reducer in python.

Note that I used the standard input for the data to flow in so that I am not restricting any approaches.

Results

I gathered statistical counts for different types of vehicles that are involved in an accident over the period of time in the data.

AMBULANCE 3713

BICYCLE 24153

BUS 25871

FIRE TRUCK 1333

LARGE COM VEH(6 OR MORE TIRES) 27981

LIVERY VEHICLE 17775

MOTORCYCLE 10029

OTHER 51360

PASSENGER VEHICLE 1005160

PEDICAB 123

PICK-UP TRUCK 26281

SCOOTER 534

SMALL COM VEH(4 TIRES) 30048

SPORT UTILITY / STATION WAGON 363209

TAXI 63892

UNKNOWN 105481

VAN 51666

If there is anything you want to talk about please feel free to reach out on linkedin. If you find any issues feel free to update them on the issues of this repository.