facebookresearch/StarSpace

starspace not recognizing labels trained by fasttext

tharangni opened this issue · 3 comments

I am trying to do a multi label document classification task and I want to use pretrained word vectors from fasttext as my initial model weights.

However, the labels do not get recognized as distinct labels i.e. if i have a label __label__science in my dataset, the __label__ prefix is stripped and a vector is generated only for science - it basically loses information of the fact that it is a label in fasttext.

Therefore, when i try to load such a model into starspace, there are no labels recognized from the pretrained vectors (num labels in model = 0) and my original classification objective becomes obsolete. Any help to get around this problem?

#163 was also referred but i don't think it addressed this issue

ledw commented

@tharangni Hi, thanks for reporting. That is not the expected behavior. Did you set the -label parameter to be __label__? In addition, make sure that '-fileFormat' is set to 'fastText'.

@ledw I did that as mentioned but the problem isn't still resolved.

ledw commented

@tharangni sorry for the delay in replying as it slipped through. I just tried a toy example with something like
hello 0.1 0.2 0.3
world -0.1 0.0 0.5
__label__1 0.9 -1.2 -0.5
and the model is able to load 2 words and 1 label. Is your fasttext pretrained embedding of the same format?
If you can share with me the pretrained embeddings or the data you used, I can help to look further.