This is a group project on COVID 19 data analysis using Spark DataFrame and RDD. Each member has worked on 2 queries. I picked 10 best and 10 worst States of the US based on the ratio of death due to Covid and population. Also, I have showed the rate of Covid deaths against total number ofn deaths per state. The output data has been exported as CSV files to use for visualization.
- Apache Spark & Spark SQL
- HDFS and YARN
- SBT
- Scala 2.12.10
- Time_series_covid_19_deaths_US.csv
- usDeath2020.csv
- deathBYcovid.csv
- bestStates.csv
- worstStates.csv