About the vocabulary constructed on snli
Howardqlz opened this issue · 2 comments
Howardqlz commented
I see this code:
self.TEXT.build_vocab(self.train, self.dev, self.test, vectors=GloVe(...))
As i know, we should constrcuct vocabulary only on trainset?
galsang commented
The code line means that we build an embedding matrix that can map any word in datasets (including dev and test in addition to training) to the corresponding word representation initialized with the pre-trained GloVe vector.
We can, of course, utilize the pre-trained vector for a word that is not included in the training set but appears in the test set, even though the vector would not be fine-tuned during training.
daitianxie commented