Separate Training Files vs Single Training File
agemagician opened this issue · 1 comments
agemagician commented
Hello,
Does it make any difference to have the training text file as a single file or separate files ?
Is the internal state rest it self between different text files ?
I tried to train a model with separate files vs single file, and so far I can see the perplexity is much lower when I have separate files.
matt-peters commented
It shouldn't make a difference with one vs many files, assuming they are prepared in the same manner (e.g. were generated with something like cat separate_files* > single_file.txt
), you specify all of the separate files for training, and you are careful to separate the heldout validation/test files from the training files.