The data we will use is open and available from NYC Open Data:
https://data.cityofnewyork.us/City-Government/Open-Parking-and-CameraViolations/nc67-uf89
We use SQL, Spark and Hadoop technologies to analyze the NYC Open data. These technologies help us attain some useful insights about the parking and open violations in New York City.
Find all parking violations that have been paid, i.e., that do not occur in openviolations.csv
Find the frequencies of the violation types in parking_violations.csv, i.e., for each violation code, the number of violations that this code has
Find the total and average amounts due in open violations for each license type
Compute the total number of violations for vehicles registered in the state of NY and all other vehicles.
Find the vehicle that has had the greatest number of violations
Find the top-20 vehicles in terms of total violations
For each violation code, list the average number of violations with that code issued per day on weekdays and weekend days. You may hardcode “8” as the number of weekend days and “23” as the number of weekdays in March 2016. In March 2016, the 5th, 6th, 12th, 13th, 19th, 20th, 26th, and 27th were weekend days (i.e., Sat. and Sun.).