A socio-economic analysis of countries with an emphasis on Pakistan.
Ranking countries based on their economic and environmental well-being using Gapminder dataset containing data about many indicators like life expectancy, GDP etc from 1952 to 2012.
'TESPH' which stands for Technology, Economy and Environment, Social, Population, and Health is a comprehensive rating created from analysing a large proportion of data which shows a country's overall wellbeing over a certain period of time.
Measuring a country's prosperity from just one standpoint would be biased so this ranking was determined in two ways:
Red TESPH: Measure of a country's economic strength reflected in it's economic growth, diversity, and stability.
- For RED TESPH, correlation of indicators with Total GDP US was found because GDP is the most comprehensive measure of economic performance and growth of a country.
Green TESPH: Measure of a country's standard of living in terms of sustainable development and health.
- For GREEN TESPH, the indicator Life Expectancy was chosen as it determines the overall health status of a country across all ages.
Data of 16 Countries from 7 Regions over a time period of 16 Years was distilled into 6 Pillars made up of 25 Indicators.
Pandas and Numpy Libraries for data analysis. Matplotlib and Plotly Express Libraries for data visualization.
- Data Cleaning
- Data Normalization
- Indicator Selection
- Pearson Correlation Coefficient to find the pairwise correlation of all indicators
- Average Calculation
- Weight Distribution
- Calculation of weighted averages
- Calculation of Aggregate Averages of all pillars
- Determining Indicators' Impact based on Correlation
- Data Anomalies
- Calculating Final Rankings.
For further information and key findings please read the Project Report (PDF File) in the code section.
This was my first semester term project. I'm still learning Data Analysis and Data Science. So your suggestions if any to improve this project will be appreciated.