Brazilian Name Generator

Create cool and awkward names with Language Models!

About The Project
- Built With
Getting Started
- Prerequisites
- Local Installation
Usage

Generate New Names
Reproduce Training
Docker

License
Contact
Acknowledgements

About The Project

Predicted the name: RUNNATAIDENILOS
Prefix: RU
Context Size: 2
Seed: 1

Language Models are tasked with assigning a probability to a word or even a sentence. They correct the misspelled words you type on your cell phone, as well as help your personal assistant to understand you.

In this fun project, I used them to make a probabilistic model of the characters of Brazilian names using data from the 2010 census. Then, I used these models to generate new names.

It works by guessing next letters based on the previous ones. For instance, what is the most probable name given that the name starts with Pau...? For the English language it will probably be Paul, while for Portuguese it will be Paulo. However, if we use a small enough context size (e.g., number of previous letters to infer the next one), awkward and cool names start to appear =)

Built With

Cookiecutter Data Science Project Structure
Python Data Science Tools (Pandas, Numpy, etc)

Getting Started

You can use this project with docker or install locally in your machine

Prerequisites

Docker

Linux/WSL
Conda

Local Installation

Clone the repo

git clone https://github.com/renan-cunha/NameGeneratorBR
cd NameGeneratorBR/

Create environment

make create_environment
conda activate NameGeneratorBR

Install requirmeents
```
make requirements
```

Usage

The repo has five trained models, from context size equal to 0 (e.g., the next letter is predicted by how much it appears in the dataset) to 4 (e.g., the previous four letters are used to infer the next one).

Generate New Names

If you want just to generate a new name, use the src/models/predict_model.py with the following options:

Usage: predict_model.py [OPTIONS]

Options:
  -cs, --context_size INTEGER  How much context to use for the language model,
                               The pre-trained models go from 0 to 4
  -p, --prefix TEXT            The beginning of the name to be predicted (OPTIONAL)
  -s, --seed INTEGER           Seed to reproduce experiments (OPTIONAL)
  --help                       Show this message and exit.

Ex:

(NameGeneratorBR) renan@DESKTOP-AD25DOI:~/git/NameGeneratorBR$ python src/models/predict_model.py -cs 4 -p pau -s 0
Predicted the name: PAULO
Prefix: PAU
Context Size: 4
Seed: 0

Reproduce Training

To reproduce the training, use the command below

make train_model

Docker

Pull the image

docker pull renancunha97/name-generator-br

And make new names

renan@DESKTOP-AD25DOI:~$ docker run renancunha97/name-generator-br -cs 4 -p pau -s 0
Predicted the name: PAULO
Prefix: PAU
Context Size: 4
Seed: 0

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Renan Cunha - renancunhafonseca@gmail.com

Acknowledgements

If you are curious about Language Models and Natural Language Processing in general, I highly recommend Jurafsky's drafts of Speech and Language Processing 3rd edition and his classes.

renan-cunha/NameGeneratorBR