IntelLabs/academic-budget-bert

What is the size of the processed data?

leoozy opened this issue · 1 comments

Hello, I processed the wikipedia and bookcorpors using your scripts. The total size of the processed wikipedia dataset is around 106G (~2650 hdf5 files). Could you please tell me whether it is right?

Sounds about right.