[Question] Prediction of 'O' tags
Closed this issue · 1 comments
I've been working through your paper and code, and I have a question about the prediction of 'O' tags. Is there anything in the formulation which pushes the model towards predicting 'O' if all of the label functions abstain? From what I can see there isn't? When I created a toy example with a few label functions which either add "I-per", "I-loc" etc. or "ABS" (but never "O") the probability of an O tends to zero as it's not in the training data.
But I'm unsure how in practice you'd create label functions that could correctly label 'O' and as far as I can see you're not doing this in your examples - other than the spacy label function. One thought is to create a function run after all the others to label 'O' if all the label functions abstained.
I feel I'm missing I'm missing here?
Hi David,
Nope, I think you've got it. 🙂
You're right, "O" tags need to be supervised as well as other tasks, and getting that right can be tricky because of the class imbalance. In the tutorial, we're able to just use Spacy because we're interested in a subset of named entities for which we already have a tagger. In other words, it's sufficient to tag most things as "O" and then the model learns when to override that default because something looks like a work or an award.
You can see more domain-specific examples in the repo for the AAAI 2020 paper. For example, in the laptop review problem, the actual computers often were getting mistaken for features of laptops, so we wrote a specific rule to exclude them: https://github.com/BatsResearch/safranchik-aaai20-code/blob/master/LaptopReview/train_generative_models.py#L210
But in general, we don't have a different approach from other tags. It's something that needs to be explicitly supervised.
Hope this helpful! Let us know if we can answer any follow up questions, or feel free to close issue when you'd like.