rhiever/name-age-calculator

How did you clean up last names in the SSA dataset

andrewgodman opened this issue · 2 comments

I've found there are last names with low occurrences. Like: Goodman

On the raw data I found there was a few dating back to the 30s.

PS, if you know of a good dataset of Last names I would be keen as I've been working on name detection.

Hi @andrewgodman,

This dataset is purely of first names. I'm not aware of a dataset for last names.

@rhiever Thanks, oddly enough I have found some last names in the SSA data set. One example is the file yob1919.txt containing: Goodman,M,5 but I can not find this name in your data set. This is a good thing :) I've been having issues with the data for gender by name that is using the SSA data as well: https://data.world/howarder/gender-by-name