lanwuwei/SPM_toolkit

If i want to use the sentence pair model to get the similarity between them?

BruceLee66 opened this issue · 6 comments

Now i have 1000000 sentence pairs,which throw out the same meaning.when i use those data to train the sentence model,i saved the model static pkl. But i use the trained model to eval new sentence pair,almost all of them get the score(1.0) .
what should i do?can you give me some advice!

All positive training examples? no negative?

yes. all sentences pairs are similar. when I use this trained model to predict other sentence pair which is different from each other.its score still be very closely to 1.I really confused.

You need negative samples for training, otherwise the model will biased towards positive case.

I decide to select negative examples randomly. The number of negative samples is 5 times that of the positive example.Would that be OK?

1:1 should be enough. Importantly, you need to make sure the negative examples are meaningful: a pair shared many n-gram words but non-paraphrase.

okay!I will try the ratio of 1:1,thank you very much.