EleutherAI/the-pile

Question regarding Shuffling

LeoXinhaoLee opened this issue · 1 comments

Hi, thank you very much for releasing this great dataset. I am wondering if the original PILE dataset (with 30 chunks) have already shuffled? Or do we still need to globally shuffle PILE before using it for pertaining? Thank you.

yuzc19 commented

Hi, @LeoXinhaoLee I am also curious about it. Are there any conclusions?