About the vocabulary constructed on snli

Question

About the vocabulary constructed on snli

Howardqlz opened this issue 5 years ago · 2 comments

I see this code:
self.TEXT.build_vocab(self.train, self.dev, self.test, vectors=GloVe(...))
As i know, we should constrcuct vocabulary only on trainset?

Answer 1 · 2020-01-08T01:48:06.000Z

The code line means that we build an embedding matrix that can map any word in datasets (including dev and test in addition to training) to the corresponding word representation initialized with the pre-trained GloVe vector.
We can, of course, utilize the pre-trained vector for a word that is not included in the training set but appears in the test set, even though the vector would not be fine-tuned during training.

Answer 2 · 2020-11-24T11:17:40.000Z

Hello, why does the code stop after running an epoch