A lot of names

Question

A lot of names

Mte90 opened this issue 5 years ago · 4 comments

I see in the sentences a lot of names/surnames of people in a single row like:

Luís Rocha
Beatriz Ortiz
Laura López
Jon Elmore
Fatal Fury
Brian Joy
Pedro Sass

So I don't think that is cv-tools failing but the scraper that takes them maybe from the page title or from a link to a bio?

Those are very difficult for Italians and also people don't understand why there are them.

Answer 1 · 2020-01-03T17:40:59.000Z

Without knowing where exactly these are coming from, it's hard to come to a conclusion here.

Answer 2 · 2020-01-04T19:43:41.000Z

They are extracted by the scraper, seems that cvtools cannot detect them probably because some of them are verbs in surnames as example.
I think that an option in the scraper that can detect names like:

if the sentence is two words
one of them is a name like John ignore it

Can help on getting only sentences and not names, I don't know why there are lines with only a name, maybe because is the title of the page that is extracted by the tool itself.

Answer 3 · 2020-01-05T19:18:25.000Z

if the sentence is two words

That is already configurable.

one of them is a name like John ignore it

That will be very hard to detect.

I think this could be the same as #57, I'm duping this for now.

Answer 4 · 2020-01-06T13:36:49.000Z

Yes two words is configurable but is not enough as control for our needs.