/int3-data-analysis

Exploratory data analysis on world suicide rate dataset

Primary LanguageJupyter Notebook

Exploratory Data Analysis on the World Suicide Rate from 1985-2016.

Overview

I chose this project because I like to study topics that impact a wide variety of people in the hopes of improving the lives of others. This data contains reported suicide numbers from countries all around the world as well as each country's GDP, HDI(Human Development Index), and certain demographic information such as sex, generation, and the year it was reported.

I was curious as to whether there was any correlation between any of the features in this data and if we could find any trends or patterns that can give us insight into this issue. I also wanted to practice the exploratory data analysis methods I learned in my intro to data analysis class. Specifically data-cleaning, data-formatting, statistical analysis, categorical data analysis, and data-visualization.

Dataset

Click here to find the dataset I used for my analysis.

Technologies Used

  • Jupyter Notebook
  • Python
  • Pandas
  • Matplotlib
  • Seaborn
  • Numpy
  • Scipy

Resources

Summary of Analysis

While I didn't necessary find any correlation between the different features in my data as it relates to the suicide rate, I did find some interesting information.

I found that the number of suicides among males were reported at a higher rate than females. I also noticed that the suicide rate for males spiked after 2015.

Generation Z had the lowest number of suicides. This could be because they're still young or potentially due to other factors not visible in this particular dataset. Boomers and the Silent Generation had the highest suicide rate amongst all the different generations.

The highest number of suicides come from those within the 35-54 year old and the 55-74 year old age group. The 5-15 year old age group had the lowest number of suicides.

There were certain countries with abnormally high suicide rates that skewed the data. However I think this information is worth noting. The United States, Russian Confederation, Ukraine, Japan, Germany, and France (to name a few) purported much higher numbers than other countries.

I plan to continue working with this dataset as I deepen my understand of data analysis and machine learning.