The-Mr.-or-Ms.-Dilemma-Can-You-Guess-Them-All

Keywords

Vietnamese Name Analysis · Vietnamese Name Prediction · Vietnamese Name Generation

Overviews

In human society, names are often inextricably linked to individual identity, and often, they reflect a community’s gender-based norms and values. This paper sets out to discover the pattern between Vietnamese names and gender, via machine learning models that can categorize the latter based on the former.
This paper introduces a dataset of gender-annotated names, vectorized with TF–IDF, as well five binary classification models (Logistic Regression, Bernoulli Naive Bayes, Random Forest, Support Vector Machine, and Neural Network).
In addition, this paper also investigates the impact of duplicate deletion and dimensionality reduction by Truncated SVD in data preprocessing on the models’ performance. An assessment of the importance of each name component (last, middle, first) on gender classification follows. These findings shed light on the relationship between Vietnamese names and gender, highlighting the potential of machine learning approaches to decipher gender-based naming conventions.
Furthermore, in addition to gender prediction, we would like to extend our work by constructing a model that can detect such defective records and return a proper name with spacing. What's more, due to rapid population growth, naming babies can be a challenging task for parents who wish to give their children rarer names. Therefore, we would like to require the model to have the ability to generate name that is rare enough but not to be defective.
Finally, we have implemented the project both from scratch and from libraries.

Collaborators

Name Student ID Email
Nguyen Nam Hai 20214894 hai.nn214894@sis.hust.edu.vn
Doan The Vinh 20210940 vinh.dt210940@sis.hust.edu.vn
Pham Quang Tung 20210919 tung.pq210919@sis.hust.edu.vn
Nguyen Ba Thiem 20214931 thiem.nb214931@sis.hust.edu.vn