/AdolescentMaternalMortality

First Full-Scale Data Science Project: Assessment of Likelihood of Adolescent Maternal Mortality in Mexican States

Primary LanguageJupyter Notebook

Springboard Capstone 1

Likelihood of Adolescent Maternal Mortality in Mexican States

Background:

In early 2019, I was traveling through various parts of Latin America around the same time that my Springboard course began. When my Mentor ended up being from Mexico, I decided that I wanted to work on a project that would help give back to my home country's Latin neighbor, and to visit the country during my studies. I came across the the website Data Science for Good, where a team of data scientists worked on a project that sought to assess why Mexico's Maternal Mortality Ratio (MMR, calculated by the WHO as # of deaths during pregnancy or within 42 days after birth , per 100,000 live birth) "has stagnated (over the past 10 years)despite additional efforts from the government to further bring it down." As an American woman in child-bearing years traveling in Latin America, focusing on the maternal mortality factors present in Mexico seemed like a fitting topic to analyze and design my first capstone project. This dataset was used as the primary source of features for the study.

It is both exhilerating and troubling to learn about the successes and failures in the advancements in women's health worldwide, especially when it comes to maternal mortality. Based off of the World Health Organization's website, maternal mortality is usually the result of preventable complications during pregnancy and the act of childbirth. It would make sense, then, that maternal mortality is more common in rural and poor communities lacking access to care and resources to help pregnant women. Even more alarming, however, is the high prevalence of adolescent maternal mortality (ages less than 20), who face a higher risk of complications and mortality as a result of pregnancy than other women. Worldwide, over 13 million adolescent girls give birth every year, and complications from those pregnancies and childbirth are a leading cause of death for those young mothers.

Overview:

The purpose of this particular study is to assess the factors that impact the likelihood of adolescent maternal mortality within each state of Mexico. It should be noted that simply detecting that a Region's mean age of maternal mortality falls below the country's norm does not solve this issue. Rather, having a mean age that is below the norm can be used as an indicator that the region may have higher instances of adolescent maternal mortality and where more aid is needed, especially with regions with lower averages of availability to healthcare. Within the Data Science for Good dataset, I chose to mimic the factors listed on the WHO's webpage explaining maternal mortality and selected rows for each woman's age at mortality, local community size, education level reached, and presence of medical assistance. Since these were mostly numerical values, that made it possible to averaged these values to represent the 'average maternal mortality by state'. Additional information about the state's overall GDP and population size was then merged with the averaged features within the Data Science for Good datatset.

Important Factors of Interest:

  • Mean Age of Maternal Mortality by State in Mexico
  • GDP by State in Mexico
  • Population by State in Mexico
  • Length of Women Education
  • Presence of Medical Assistance
  • Region Local Community Size

Model Construction:

Dependent (Target) Variable: A binary variable indicating if the state's mean age of maternal mortality was Above(0) or Below(1) Mexico's overall mean age of maternal mortality.

Independent (Feature) Variables:

  • State
  • State GDP in 2015
  • State Population Size in 2015
  • If State GDP Increase(0)/Not(1) from 2010 to 2015
  • State Average Length of Education in Deceased Maternal Women
  • State Average Presence of Medical Assistance Received by Deceased Maternal Women
  • State Average Local Community Size of Deceased Maternal Women

Conclusion:

Creating a Logistic Regression Model off of the scaled data produced a far more accurate predictive model than the unscaled data. Therefore, based off of the scaled dataset, the machine learning model created was accurately able to predict if a Region in Mexico will have a mean age of maternal mortality that is above or below the country's mean age, based off of the Region's GDP, recent changes in GDP, population size, mean educational level of maternal women, mean local community sizes of maternal women within each region, and the mean average of presence of medical assistance for maternal women.

Additional machine learning models that incorporate the level sex education, average distance from the nearest hospital, and number of child-bride instances within each Region of Mexico can help provide additional, more detailed information on the likelihood of adolescent maternal mortality. Assessing these factors can provide correlation data to potential needed resources (such as increased access to healthcare) and socioeconomic factors (such as child brides) that provide measurable factors to quantify a further reduction the rate of young mother mortality.

The scripts provided here calculate the risk probability of an adolescent maternal mortality by State in Mexico based on some of the top features contributing to maternal mortality. While this is a multi-dimensional issue, for the sake of this study, the following 8 factors were used to predict the likelihood of adolescent maternal mortality by region: region population, region GDP, local poverty level, level of education, and access to medical assistance in order to help direct government funds to areas where it would be most beneficial.

Use

  • translation_english.txt translates Spanish columns and data information into its English counterpart.
  • 1_data_wrangling.ipynb cleans the source data and assesses mean maternal mortality age and mean adolescent maternal mortality age by State in Mexico.
  • 2_explanatory_data_analysis.ipynbAssesses distribution of general target variable.
  • 3_adomaternal_mortality.ipynbAssesses distribution of just adolescent maternal mortality.
  • 4_inferential_statistics.ipynbStatistically proves mean age of maternal mortality in the dataset is comparable to the actual mean age of maternal mortality in Mexico. Also proves via ANOVA that at least one state in Mexico has a mean age of maternal mortality statistically different from the others.
  • 5_merging_dataframes.ipynb Merges averaged information on State instances of maternal mortality with State enconomic factors (GDP and Population Size).
  • 6_machine_learning.ipynbProduction of the pipeline machine learning model using logistic regression and standard scaling to assess the likelihood of adolescent maternal mortality within a State of Mexico.
  • Dependencies

    The specific Python files written by this code assume you have the following tools added to your module directories:
    • General
      • pandas
      • numpy
      • seaborn
    • Data Visualization
      • pyplot
      • pylab
      • empirical_distribution
    • Statistics
      • statistics
      • scipy
    • Machine Learning modules
      • train_test_split
      • DecisionTreeClassifier
      • RandomForestClassifier
      • LogisticRegression
    • Hyperparameters
      • GridSearchCV
      • roc_auc_score
      • cross_val_score
    • ROC Curve
    • Scale Data
      • StandardScaler