datamade/probablepeople

Failure on middle names

mattmcgrattan opened this issue · 1 comments

I have an issue where it's failing when it hits a name with a middle name:

probablepeople.RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  Edwin Austin Abbey
PARSED TOKENS:    [('Edwin', 'GivenName'), ('Austin', 'Surname'), ('Abbey', 'GivenName')]
UNCERTAIN LABEL:  GivenName

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
az0 commented

It is confused because it thinks Austin is the last name and Abbey is a first name

According to Wikipedia, Abbey is a surname
https://en.wikipedia.org/wiki/Abbey_(surname)

I assume the solution is more training examples where Austin is a first or middle name and Abbey is a last name