google-research/t5x

About the-pile dataset

hwyFighting opened this issue · 1 comments

Hi!
How can I download the-pile dataset in another way for training on GPU.
thanks for the answer!

The current recommendation for downloading pile is here: https://github.com/NVIDIA/JAX-Toolbox/tree/main/rosetta/rosetta/projects/t5x#downloading-the-pile . Note that it is around 1TB, so you'll have to make sure you have the disk space for it.