"Github" code data download only
HangXue-lab opened this issue · 2 comments
HangXue-lab commented
The size of pile is too big for me. I just want to download the "Github" code data. But the number of Pile train file is 30. I would like to know exactly which file contains the "Github" code data.
igorbrigadir commented
The data is already processed by that stage, and may not be what you want. You probably want the github.tar
from the preliminary components https://the-eye.eu/public/AI/pile_preliminary_components/github.tar and process it yourself.
osainz59 commented
The link is no longer working, is there another link to obtain the data?