This repo analyses the data coming from Kaggle's Death in the United States dataset
- The data contains the log file of deaths in the united states from the year 2005 till the year 2015. It has a long list of attributes that can be analyses and related to each other like race , year of death, education, gender, cause of death , education level and so on.
- Each year has its own CSV file of around 500 Mb and the schema of the legal attributes of each year. The total size of the dataset after decompression is 4 Gb.
- Some fields may be empty or not filled for a specific case.
In this section we get the relation between multiple attributes of the dataset along with others. The list of of the search analytics are as follows :
-
The relation between education and cause of death.(Mousa)
- The relation between work and death. (don't know if it will work)
- sports and cause of death.
- How does education affect lifespan?
- How does education affect cause of death?
-
Business report. (Each did his part)
-
The most frequent causes of death generally. (Moustafa)
- The most frequent causes of death for each race.(done)
- and for each gender (done)
- The day and month that most people died in.(done)
-
time series analysis (Khalid)
- The causes of death for each year.(done)
- The most dangerous causes of death for each season .(done)
- Trend fitting (Machine learning) (done)
-
Violence and death (Ahmed)
- Gun vs Vehicle Deaths. (done)
- Guns and race. (done)
- Homicide Vs other causes of death.
- suicide correlation of age and education. (done)
(interesting but not required)
- Comparative analysis between men and women in attributes.
- Race - age record - place of death.
- Cardiovascular disease for men and women : it's a medical fact that men usually die out of cardiovascular disease more than women so let's test it.
- Heart disease analysis