/TextMining

Text Mining to Reveal Gender Bias in Parenting books using Word2Vec

Primary LanguageJupyter Notebook

Text Mining to Reveal Gender Biases in Parenting Books

Text mining and digitalized text data allows the emergence of natural language processing these days. People desperately hope to teach machine to master this art of human being and have already started to build technology based on machine learning outcomes based on text data. However, human language itself itself carries many biases (e.x. gender, race, age etc.) and without interventions, machine learning prediction will carry out the same bias in its outcome.

This project for my machine learning class final tries to tap into this space by learning how to train the most popular word- vectorization model called Word2Vec and explore whether parenting books have different "genderred" language associated with "mom" and "dad". I will

  1. Explain Word2Vec algorithm and how it works
  2. Train Word2Vec model on the text dataset of six parenting books
  3. Experiment with model's basic functionalities (e.x. word similarity score)
  4. Experiment with different parametrizations
  5. Output words along gender (dad-mom) axis and exam whether the text data has gender bias.

You can access the code here