The goal of this project is to infer qualitative data regarding the car accidents in New York City.
- charts
The input dataset can be downloaded from here.
Execute init.sh
, this will download csv file and will create a run.sh
script explained in the next section
gradle build
Execute run.sh
passing as argument runAll
to run all queries or you can select only some of them (passing them as separated arguments) from the following list:
- ContributingFactors
- LethalPerWeek
- WeekBorough
for example ./run.sh LethalPerWeek WeekBorough
.
# To run the queries on the cluster
hadoop jar build/libs/hadoop-accidents-1.0-SNAPSHOT.jar mw.hadoop.queries.ContributingFactors file:///path/to/NYPD_Motor_Vehicle_Collisions.csv file:///path/to/output1
hadoop jar build/libs/hadoop-accidents-1.0-SNAPSHOT.jar mw.hadoop.queries.LethalPerWeek file:///path/to/NYPD_Motor_Vehicle_Collisions.csv file:///path/to/output2
hadoop jar build/libs/hadoop-accidents-1.0-SNAPSHOT.jar mw.hadoop.queries.WeekBorough file:///path/to/NYPD_Motor_Vehicle_Collisions.csv file:///path/to/output3