I chose this project because I like to study topics that impact a wide variety of people in the hopes of improving the lives of others. This data contains reported suicide numbers from countries all around the world as well as each country's GDP, HDI(Human Development Index), and certain demographic information such as sex, generation, and the year it was reported.
I was curious as to whether there was any correlation between any of the features in this data and if we could find any trends or patterns that can give us insight into this issue. I also wanted to practice the exploratory data analysis methods I learned in my intro to data analysis class. Specifically data-cleaning, data-formatting, statistical analysis, categorical data analysis, and data-visualization.
Click here to find the dataset I used for my analysis.
- Jupyter Notebook
- Python
- Pandas
- Matplotlib
- Seaborn
- Numpy
- Scipy
- Cleaning and Prepping Data with Python for Data Science
- Detailed Exploratory Data Analysis with Python
- Ways to Detect and Remove the Outliers
- Exploratory Data Analyis(EDA) and Data Visualization
- Binning Data
- Handling Categorical Data in Python
- Data Visualization Using Seaborn
- Visualizing Economic Data Using Plotly
- Pandas Tutorial 2: Aggregation and Grouping
While I didn't necessary find any correlation between the different features in my data as it relates to the suicide rate, I did find some interesting information.
I found that the number of suicides among males were reported at a higher rate than females. I also noticed that the suicide rate for males spiked after 2015.
Generation Z had the lowest number of suicides. This could be because they're still young or potentially due to other factors not visible in this particular dataset. Boomers and the Silent Generation had the highest suicide rate amongst all the different generations.
The highest number of suicides come from those within the 35-54 year old and the 55-74 year old age group. The 5-15 year old age group had the lowest number of suicides.
There were certain countries with abnormally high suicide rates that skewed the data. However I think this information is worth noting. The United States, Russian Confederation, Ukraine, Japan, Germany, and France (to name a few) purported much higher numbers than other countries.
I plan to continue working with this dataset as I deepen my understand of data analysis and machine learning.