urchade/GLiNER

Question regarding case of labels

Opened this issue · 3 comments

I've found that the model has different behaviour depending on the case of the labels.

For example, I've found a case in which if a label has a capital letter, then it extracts more entities than the same label without capital letter.

Is this as expected? Is there any recommendation on how to properly format the labels or the case should only be determined by the accuracy tests tailored to the specific dataset in use?

Hi @bitliner, this is due to training data making some models better with lowercase and others better with capital letters.

Is this as expected? Yes, deep learning models do not generalize well outside their training domain

Do you have any list of entity labels that this model support?
What if I just want to extract all possible entities from text irrespective of labels

Just like spacy