lucidrains/electra-pytorch

Custom Dataset

appledora opened this issue · 0 comments

Trying to use this repo to train electra from scratch for Bangla. I have my dataset as a csv where each row is a document.
Would the default openwebtext/preprocess.py file would help here? Where else might I need to modify? Thanks!