A lot of names
Mte90 opened this issue · 4 comments
I see in the sentences a lot of names/surnames of people in a single row like:
Luís Rocha
Beatriz Ortiz
Laura López
Jon Elmore
Fatal Fury
Brian Joy
Pedro Sass
So I don't think that is cv-tools failing but the scraper that takes them maybe from the page title or from a link to a bio?
Those are very difficult for Italians and also people don't understand why there are them.
Without knowing where exactly these are coming from, it's hard to come to a conclusion here.
They are extracted by the scraper, seems that cvtools cannot detect them probably because some of them are verbs in surnames as example.
I think that an option in the scraper that can detect names like:
- if the sentence is two words
- one of them is a name like John ignore it
Can help on getting only sentences and not names, I don't know why there are lines with only a name, maybe because is the title of the page that is extracted by the tool itself.
if the sentence is two words
That is already configurable.
one of them is a name like John ignore it
That will be very hard to detect.
I think this could be the same as #57, I'm duping this for now.
Yes two words is configurable but is not enough as control for our needs.