Test data in vocabulary preparation
avinashsai opened this issue · 2 comments
avinashsai commented
Hi,
Congratulations on your amazing work. I have a doubt in vocabulary preparation in line 47 in utils.py.
Testing data is also used in vocabulary preparation. However, testing data should be completely unseen right??
Please correct me if I am wrong.
Thank you
hsqmlzno1 commented
Testing data should be used in vocabulary preparation. Otherwise, you cannot learn the semantics and information of any target-specific words.
hsqmlzno1 commented
If there exists a large amount of unlabeled data in the target domain, the vocabulary of target unlabeled data is enough to cover the testing data of the target domain. In this case, i think it's not necessary to use the dictionary from the testing data.