auspicious3000/contentvec

What are the pseudo labels?

Closed this issue · 1 comments

Hi, I thought that ContentVec (as well as HuBert) use k-means algorithm for creating labels. So for what reason we need {train,valid}.km and what exactly they are?

Thank you :-)

{train,valid}.km are the labels clustered by k-means