starspace not recognizing labels trained by fasttext
tharangni opened this issue · 3 comments
I am trying to do a multi label document classification task and I want to use pretrained word vectors from fasttext as my initial model weights.
However, the labels do not get recognized as distinct labels i.e. if i have a label __label__science
in my dataset, the __label__
prefix is stripped and a vector is generated only for science
- it basically loses information of the fact that it is a label in fasttext.
Therefore, when i try to load such a model into starspace, there are no labels recognized from the pretrained vectors (num labels in model = 0) and my original classification objective becomes obsolete. Any help to get around this problem?
#163 was also referred but i don't think it addressed this issue
@tharangni Hi, thanks for reporting. That is not the expected behavior. Did you set the -label
parameter to be __label__
? In addition, make sure that '-fileFormat' is set to 'fastText'.
@tharangni sorry for the delay in replying as it slipped through. I just tried a toy example with something like
hello 0.1 0.2 0.3
world -0.1 0.0 0.5
__label__1 0.9 -1.2 -0.5
and the model is able to load 2 words and 1 label. Is your fasttext pretrained embedding of the same format?
If you can share with me the pretrained embeddings or the data you used, I can help to look further.