Build Data Pipelines for training on larger datasets
anuragshukla06 opened this issue · 0 comments
anuragshukla06 commented
Presently, the model can train on 320,000 sentence pairs consisting 14 tokens each on a Tesla P100 GPU. The dataset is loaded all at once in memory.
Constructing data pipelines would allow picking up only batch size of data into memory allowing it to train on larger datasets