google/cld3

Spanish manual language problems detection

Opened this issue · 0 comments

Hi cld3 team!
Thank you so much for this development, it is so useful!
I have used your language detection package vía R (see code) and then done some manual tagging for Spanish (see "human" column in this csv) and have found some things that might be interesting but I am unsure of how to make it useful for you?
For instance, related to this issue, from a list of conference titles, those in "Spanglish" got tagged as English w/cld2 and as Spanish with cld3.
Also, while cld3 got real better at distinguishing Galician from Spanish there is still one case in which it got this tag wrong: "SAMEBibl: Sistema Automático de Migración a Europeana para Bibliotecas" (should be Spanish)
Hope this is somewhat useful :)