Project-Two-Amplifire-Github: A Scala repository from Jackesims

Project 2

Proposal:

Create a Spark Application that processes data from Kagle at this link:
https://www.kaggle.com/cdc/mortality

Presentations

We will be analizing the following questions:

Average age of death from lightning
Manner of death % for each year
Marriage life expectancy throughout the years
How many people died at work while doing non-work activities
Deaths on the job compared to education level
Deaths on the job compared to age
accumulative % of autopsied vs. non-autopsied
How many people died in the month of May while engaged in sports activities
What are the main ways people die while doing vital activities?
What are the main causes of death while resting, sitting, or eating?
What's the impact of education on the life expectancy of an individual?
Deadliest day of the week and deadliest month of the year?
Deadliest day of the year?
what are the main causes of death for infants? (top 5-10ish)

Technologies

Apache Spark
Spark SQL
YARN
HDFS and/or S3
Scala
Git + GitHub

Due Date

Presentations will take place on Friday, 19-Nov-2021.

Instructions for Using the Tool

As previously stated, data is collected in the form of CSV files from https://www.kaggle.com/cdc/mortality.

Download the individual CSV files by year, and upload each CSV file to /home/maria_dev. Then, add the file name to a separate CSV file called "csvFileList.csv", and add a return ("\n") at the end. This serves as the delimiter. Upload this specific file to /home/maria_dev.

Jackesims/Project-Two-Amplifire-Github

Project 2

Presentations

Technologies

Due Date

Instructions for Using the Tool