microsoft/INMT-lite

Build Data Pipelines for training on larger datasets

anuragshukla06 opened this issue · 0 comments

Presently, the model can train on 320,000 sentence pairs consisting 14 tokens each on a Tesla P100 GPU. The dataset is loaded all at once in memory.

Constructing data pipelines would allow picking up only batch size of data into memory allowing it to train on larger datasets