Quick notebook to parse names and genders from Behind the Names using Beautiful Soup
Passes foreign names into Google translate via TextBlob and adds translation and detected language from Langid.
Borrows processed names data from UK and US from OpenGenderTracker's GitHub repo
Borrows processed names from Argentina/Uruguay from GitHub repo
Many resources from this blog post
SQL DB stores name, number of male and female occurrences, flag if name can be unisex, and country, region and language hints where available.
name |
male |
female |
unisex |
country |
region |
lang |
lang_detected |
name_eng |
---|---|---|---|---|---|---|---|---|
احمد | 99999 | 0 | 0 | PK | asia | ur | ar | Ahmed |
- Add in Wilson binomial correction
- Add in url decomposition from urlparse
- Create fresh DB connections following this recipe to prevent timeout