name ethnicity classification

⬇️ installation:

repository installation:

git clone https://github.com/hollowcodes/name-ethnicity-classification.git
cd name-ethnicity-classifier/

python predict_ethnicity.py -i .\examples\names.csv -o .\examples\predicted_ethnicities.csv -m 21_nationalities_and_else -d gpu -b 64

flag	description	option
`-i, --input`	path to .csv containing (first and last) names; must contain one column called "names" (file-name freely selectable)	optional, alternative: -n
`-o, --output`	path to .csv in which the names along with the predictions will be stored (file will be created if it doesn't exist; file-name freely selectable)	required after -i
`-m, --model`	name of model configuration which can be chosen from "model_configurations/" or from the table below	optional, default: 21_nationalities_and_else
`-d, --device`	device on which the model will run, must be either "gpu" or "cpu"	optional, default: gpu if CUDA detected
`-b, --batchsize`	specifies how many names will be processed in parallel (if it crashes choose a batch-size smaller than the amount of names in your .csv file)	optional, default: amount of names in input-file

"names.csv" has to have one column named "names" (upper-/ lower case doesn't matter):

1 names,
2 John Doe,
3 Max Mustermann,

After running the command, the "predictions.csv" will look like this:

1 names,ethnicities
2 John Doe,american
3 Max Mustermann,german

python3 predict_ethnicitiy.py -n "Gonzalo Rodriguez"

>> name: Gonzalo Rodriguez - predicted ethnicity: spanish

flag	description	option
`-n, --name`	first and last name (upper-/ lower case doesn't matter)	optional, alternative: -i
`-m, --model`	name of model configuration which can be chosen from "model_configurations/" or from the table below	optional, default: 21_nationalities_and_else

name	nationalities/groups	accuracy
`28_nationalities_english_once`	click to see nationalities british, norwegian, indian, hungarian, spanish, german, zimbabwean, portugese, polish, bulgarian, bangladeshi, turkish, belgian, pakistani, italian, romanian, lithuanian, french, chinese, swedish, nigerian, greek, south african, japanese, dutch, danish, russian, filipino	78.54%
`21_nationalities_and_else`	click to see nationalities british, else, indian, hungarian, spanish, german, zimbabwean, polish, bulgarian, turkish, pakistani, italian, romanian, french, chinese, swedish, nigerian, greek, japanese, dutch, ukrainian, danish, russian	81.08%
`8_groups`	click to see nationalities african, celtic, eastAsian, european, hispanic, muslim, nordic, southAsian	83.55%
`chinese_and_else`	click to see nationalities chinese, else	98.55%
`20_most_occuring_nationalities`	click to see nationalities british, norwegian, indian, irish, spanish, american, german, polish, bulgarian, turkish, pakistani, italian, romanian, french, australian, chinese, swedish, nigerian, dutch, filipin	75.36%