coli-saar/am-parser

error when encountering unknown NER label

Opened this issue · 2 comments

a toy model crashes when encountering an unknown NER label.

To reproduce: run python3 -u train.py jsonnets/toyAMRAutomata.jsonnet -s example/toyAMRAutomataOutput/ -f --file-friendly-logging

on commit 1282115 on the unsupervised2020 branch.

According to allenai/allennlp#2147, crashing when encountering a label that is unseen is the intended behaviour as long as no OOV token (i.e. a token that says "i'm the OOV token") is in the vocabulary. My guess is that usually, such an OOV token gets added automatically, but not in this toy example.

Whether or not an OOV token is added is controlled by the vocabulary class: https://docs.allennlp.org/v0.9.0/api/allennlp.data.vocabulary.html#allennlp.data.vocabulary.Vocabulary. You can adjust this in the config file; there already is an entry for "vocabulary" in jsonnets/emnlp20/glove/AMR-2015.jsonnet for example. Of course the OOV token embedding will be untrained.