Issues
- 2
link for book3
#113 opened by wang99711123 - 2
Could you possibly share the 825GB pile data temporarily and unofficially?
#120 opened by dsdanielpark - 15
Link in Readme produces 404
#117 opened by gladwig2 - 1
Question regarding Shuffling
#119 opened by LeoXinhaoLee - 3
Issue reproducing the GitHub partition
#118 opened by osainz59 - 2
- 2
"Github" code data download only
#101 opened by HangXue-lab - 0
When accessing https://the-eye.eu/public/AI/pile_preliminary_components/, a 404 error occurs
#116 opened by s1ghhh - 0
Mismatched data size Problem
#114 opened by jaywaer - 0
book3 metadata
#115 opened by DachengLi1 - 0
Any search tools?
#111 opened by MM-IR - 6
- 1
- 2
URL Links
#109 opened by akul-goyal - 1
(Natural) Languages in The PILE
#98 opened by suzyahyah - 1
Appending data to the Pile.
#99 opened by shankerabhigyan - 1
Suggested corpus: Adult stories
#107 opened by johnflux - 0
Cannot download data , error
#108 opened by infokng - 0
Reducing download size
#106 opened by marionbartl - 0
Pile-CC Size
#105 opened by KeremTurgutlu - 2
ConvoKit datasets
#104 opened by upintheairsheep - 1
Accepting submissions to the Pile
#103 opened by upintheairsheep - 1
Public website to explore dataset
#92 opened by tuan3w - 1
failed to download stackexchange
#97 opened by sangmichaelxie - 0
- 1
Scripts for dedup and filter Common Crawl?
#96 opened by shangw-nvidia - 0
- 1
download website is not accessible
#94 opened by portia1026 - 1
Royalroad
#91 opened by KeinNiemand - 1
SHA256 Sums
#89 opened - 1
Code generation
#87 opened by 6r1d - 0
Paper checklist
#72 opened by leogao2 - 0
Caucasian Languages Dataset
#82 opened by QazQazaq - 1
Make treemaps
#70 opened by leogao2 - 0
- 11
PDF parsing
#71 opened by leogao2 - 0
Israeli Legal Databases
#65 opened by StellaAthena - 1
Legal Contracts
#75 opened by hendrycks - 1
Set up webpage
#69 opened by leogao2 - 5
Debate notes
#56 opened by Hellisotherpeople - 3
- 2
Multilingual Wikipedia
#61 opened by StellaAthena - 1
Southern African Legal Datasets
#63 opened by StellaAthena - 1
European Patent Office
#64 opened by StellaAthena - 2
Royal Society Publishing
#66 opened by StellaAthena - 2
Exploiting bitexts
#80 opened by eritain - 0
- 4
Early Buddhism
#81 opened by Blue7771 - 5
- 0