Investigate a Dataset
This project was completed as part of the course requirements of Udacity's Data Analyst Nanodegree certification.
The default repo branch contains an updated version of the project developed after the assessment submission. The master branch of the repo contains the original project and code.
Overview
The project used data from Gapminder to examine patterns in female labor force participation and its relationship to a country's economic strength. A working paper from the International Labour Organization informed the work.
A summary of the report can be found at my blog.
Key indicators were:
- Income (GDP) per person (fixed 2000 USD)
- Female employees age 15+ (% of population)
- Female agricultural workers (% of all female labor)
- Female industry workers (% of all female labor)
- Female service workers (% of all female labor)
- Mean years in school (women 25 and older)
The project involved identifying the appropriate datasets to answer the research questions, justifying data selection, joining multiple datasets, data assessment and cleaning, performing EDA and drawing conclusions from the data.
Statistical Analysis
- Examinations of central tendencies and spread
- Data visualization
- One-way ANOVA hypothesis testing
- Correlation and linear regression hypothesis testing
Technologies Used
- Python, Numpy, Pandas, Matplotlib, Scipy
- Jupyter Notebook
Key Findings
- The data supports a theory of a U-shaped relationship between a country's economic strength and female labor force participation
- A high economic strength is associated with lower rates of female participation in the agricultural sector and greate participation in the service sector
- Increase female education is associated with increased economic strength
- The relationships between employment sector and economic strength are similar to those seen between employment sector and education