Build Data Pipelines for training on larger datasets

Question

anuragshukla06 opened this issue 4 years ago · 0 comments

Presently, the model can train on 320,000 sentence pairs consisting 14 tokens each on a Tesla P100 GPU. The dataset is loaded all at once in memory.

Constructing data pipelines would allow picking up only batch size of data into memory allowing it to train on larger datasets