/World_Health_Organization_Data--Women_Of_The_World

Women's_Maternal and Reproductive Health around the world (WHO Data cleaning, Data transformation, Data Normalization, Data Engineering, Data modeling, ERD Dia, Exploratory Analysis, Data visualization)

Primary LanguageJupyter Notebook

image image

Project - Women’s maternal and reproductive health

image

Data Collection topics:

Data is collected on large set from different four topic source which are mentioned below.

Married-or-in-union-women-of-reproductive-age

  1. who-have-their-need-for-family-planning-satisfied-with-modern-methods
  2. Adolescent-birth-rate-(per-1000-women-aged-15-19-years)
  3. Antenatal-care-coverage-at-least-four-visits
  4. Births-attended-by-skilled-health-personnel

BackGround

Women's health encompasses a diverse array of physical, mental, and social well-being concerns unique to females. This includes maternal and reproductive health, which focuses on aspects such as pregnancy, childbirth, and reproductive choices. Maternal health emphasizes prenatal care, safe delivery practices, and postpartum support, aiming to reduce maternal and infant mortality rates and promote healthy pregnancies. Reproductive health further encompasses family planning and access to reproductive healthcare services, crucial for empowering women to make informed decisions about their reproductive futures. Addressing these multifaceted issues requires comprehensive healthcare strategies that recognize the intersection of biological factors, societal norms, and healthcare access, thereby promoting the overall well-being of women throughout their lives.

Main question: How does access to and satisfaction with modern family planning methods

How Among married or in-union women of reproductive age, impact reproductive health outcomes and maternal well-being rates are varied due to cultural barriers, lack of tech facilities. These disparities show the varying levels of support and challenges different regions face in collected Data sets.

Specific focus areas:

image

  • Mental health (psychological well-being such as depression, strees, anxiety)
  • Skilled birth attendance (count with doctors, nurses and midwives support)
  • Postpartum support (health care for mothers within the first months)

Sub-questions to guide the analysis:

  1. Access to Family Planning: How does access to modern family planning methods vary across different regions and socioeconomic groups?

  2. Reproductive Health Outcomes: What is the correlation between access to modern family planning methods and adolescent birth rates?

  3. Maternal Well-being: Are there correlations between family planning access and maternal healthcare- seeking behavior, such as antenatal care attendance and skilled birth attendance?

Data Engineering

image image

Step 01 - Data Collection

  • Sources: WHO datasets WHO Indicators.
  • Size and Shape: Large datasets spanning various indicators of maternal and reproductive health.
  • This Project journey involves data collection from The World Health Organization Relational Data Hub.
  • Had collected data set Related to Women’s maternal and reproductive health
  • Here Ensuring the reliability and relevance of data was paramount for me as it formed the foundation for the depth and accuracy of our analysis.

Data Collection Challenges:

  • Data normalization across different countries.
  • Varying data collection periods requiring assumptions for standardization.

Step 02: Data cleaning image

  • In this stage data cleaning was on focus.
  • Here, I prioritized data cleaning and quality by addressing issues like missing values, nulls, duplicates, outliers, changing data's physical type
  • Ensuring standardization with Python. Hence, This meticulous preparation ensures that the data aligns seamlessly with analysis goals.

Step 03: Data Transformation image

  • Step three involves data transformation, where I have shaped the data to fit the needs of analysis.
  • This includes normalization to ensure consistency and clarity in data representation, setting the stage for effective modeling.

1NF

image

2NF

  • Family-planning Table
  • Adolescent-birth-rate Table
  • Antenatal-care-coverage Table
  • Births-attended-by-skilled-health-personnel Table

3NF / Fact Table

  • Normalization with location Table
  • Normalizing Period Ranges

Step 04: Data modeling

image

crafting entity-relationship diagrams (ERDs) and establishing connections between datasets by Postgre-SQL and assigning primary and foreign keys within each tables.

Step 05: Exploratory data analysis & Visualization image

Delved into exploratory data analysis using Python libraries, and explored patterns with cleaned data sets.
This phase unveils insights and prepares the data for meaningful visualizations.

Sub-questions to guide the analysis:

1. Access to Family Planning: How does access to modern family planning methods vary across different regions and socioeconomic groups?

Geographical Analysis - Family Planning Data set

Continent Level Analysis

image

Country Level Analysis

image

Interactive Geographical Heat Map with tooltips Screenshot (HTML file is Saved)

Geographical Analysis - Family Planning Data set (Visualization)

image

Top Three Countries within each Continent - Family Planning Data set

image

Top Three Countries within each Continent - Visualization

image

Time-Period Analysis

image

2. Reproductive Health Outcomes: What is the correlation between access to modern family planning methods and adolescent birth rates?

image

image

image

3. Maternal Well-being: Are there correlations between family planning access and maternal healthcare-seeking behavior, such as antenatal care attendance and skilled birth attendance?

image

image

image

Conclusion

image

image

image

image

image

Important

Key information users need to know to achieve their goal.

Ultimately, Data journey concludes with interpreting the results, weaving them into meaningful conclusions Through this approach, I ensure that my analysis not only addresses initial problems but also adds unexpected value to business requirements through my technical expertise.

Project GIF

#

Dependency

  • CSV
  • OS
  • matlotlib
  • Pandas
  • pyplot
  • numpy
  • seaborn
  • geopandas
  • folium
  • time
  • Selenium, webdriver
  • Ipython.display, image
  • plotly.express
  • scipy.stats, pearsonr, spearmanr
  • statsmodels.api

Project GIF

# # Time Allocation: * Problem Framing: 20% * Data Exploration: 10% * Data Cleaning: 40% * Data Modeling: 20% * Data Visualization: 10%

Data Flow:

  • Data sourced from WHO -> Processed in Jupyter Notebook -> Stored and retrieved from a SQL database.
  • Schema Diagram: Detailed in the Engineering_ERD folder.

Tools Used:

Storage: SQL database for organized data storage and retrieval. Processing: Jupyter Notebook (main_file.ipynb) for data manipulation and analysis. Visualization: Matplotlib, Seaborn, Geopanda, webdriver for plotting graphs and charts.

Additional Tools:

  • NumPy: For numerical operations.
  • Pathlib: For file path manipulations.
  • CSV and OS Libraries: For handling data files.

Analytical Use Cases

  • Access Disparities: Analyzing regional and socioeconomic variations in access to family planning.
  • Adolescent Birth Rates: Correlation between family planning access and adolescent birth rates.
  • Maternal Healthcare Behavior: Link between family planning access and antenatal care or skilled birth attendance.

Demonstration

  • Jupyter Notebook: Demonstrates data retrieval and visualization.
  • Visuals: Includes Geograhical Interactive Maps(.HTML), bar charts, line graphs, and heatmaps to depict key findings. Visuals are included in the project report and presentation.

Assumptions:

  • When the period of study was done between 2 years (i.e. 2022-2023), it is assumed that the results of that particular study corresponds to 12 months and it is a reflection of the latest year (2023).__
  • The datasets were broken down in intervals of 3 years each starting in 2003 to 2023 to allow consistent analysis of data over time.
  • The study was done in married and in-union women of reproductive age, which is assumed to be between 15-49 years.
  • Assumed the same collecting data method accross countries.

Limitations:

image

  • There are more indicators that could have been analyzed to contribute to the overall hypothesis. We focused on 4 key indicators due to time constrainsts.
  • Period data was not standardized accross datasets. Some assumptions needed to be made to standardize it and make them fully comparable.

Ethical Considerations:

  • Ensuring the confidentiality and ethical use of data.
  • Addressing biases inherent in data collection methods.

Future Work Scope:

  • Extended Analysis: Incorporate more indicators for a comprehensive view.
  • Data Integration: Enhance the database with additional sources and real-time data.
  • Interactive Dashboards: Develop more interactive visualization tools for dynamic data exploration.
  • Please, refer to the word file to get the summary of the findings

Folder Structure:

  • Output: Contains all exported datasets and analysis results and visual Files.
  • Engineering_ERD: ERD for schema and SQL database export.
  • Project_Analysis: Findings and summary documents.

How to Run:

Environment Setup: Ensure you have Python and Jupyter Notebook installed. Dependencies: Install required libraries via pip: numpy, pandas, matplotlib, seaborn. Run Notebook: Open main.ipynb in Jupyter Notebook and run the cells sequentially.

Sources:

Project GIF