This project analyzes cancer death data across 195 countries and over 20 different types of cancer, using data from the year 2021. The analysis focuses on understanding global trends in cancer mortality, exploring correlations between population size and cancer deaths, and identifying the top countries and cancer types with the highest death tolls. Insights from this analysis aim to guide healthcare interventions and resource allocation, especially in regions requiring the most attention.
- Number of records: 10,000+
- Features:
Country
: The name of the country.Cancer_Type
: Type of cancer (e.g., lung cancer, Pancreatic cancer, etc.).Total_Deaths
: Total number of deaths caused by cancer in the country.Population_Size
: Population of the country.Year
: The year the data was collected (2021).
- Resolved 5% missing values by filling them with median values for both
Total_Deaths
andPopulation_Size
. - Ensured correct data types for each feature (e.g.,
Total_Deaths
andPopulation_Size
as float). - No significant outliers were found that could impact the analysis.
The analysis identifies the top 10 countries with the highest cancer deaths, showing that Spain, the USA, and Vietnam account for 40% of total global cancer deaths.
We explored which cancer types contributed the most to total deaths globally. We found out that Lung, Pancreatic and Cervical Cancer leads the Cancer death by type.
- Top Countries: Spain, the USA and Vietnam were the countries with the highest cancer mortality rates.
- Top Cancer Types: The most common cancer types were lung, Pancreatic, and Cervical cancer.
cancer_deaths_dataset.csv
: The dataset used for analysis.top_10_countries.csv
: CSV file containing the top 10 countries by cancer deaths.README_Analysis.txt
: Summary of findings from the data analysis.
- Clone this repository to your local machine.
- Install the required Python libraries using
pip
:pip install pandas matplotlib seaborn
- Open the Jupyter Notebook or Python script provided to explore the data and visualizations.
- Further investigation into the impact of healthcare infrastructure and cancer screening programs on mortality rates.
- Predictive modeling to forecast cancer deaths in coming years using machine learning techniques.
- Regional focus analysis (e.g., Sub-Saharan Africa and South Asia) for targeted healthcare interventions.
This README provides a clear and concise description of the project, including its objectives, key analyses, and instructions for running the code.