allenai/bilm-tf

Separate Training Files vs Single Training File

agemagician opened this issue · 1 comments

Hello,

Does it make any difference to have the training text file as a single file or separate files ?
Is the internal state rest it self between different text files ?

I tried to train a model with separate files vs single file, and so far I can see the perplexity is much lower when I have separate files.

It shouldn't make a difference with one vs many files, assuming they are prepared in the same manner (e.g. were generated with something like cat separate_files* > single_file.txt), you specify all of the separate files for training, and you are careful to separate the heldout validation/test files from the training files.