Issues
- 1
- 1
- 3
Need help for installing dolma
#158 opened - 0
Duplicate ids in Dolma v1.7
#157 opened - 1
dtype option is not working as expected
#152 opened - 2
Inquiry about Web Pipeline Availability
#151 opened - 2
Running paragraph level deduplication on c4
#150 opened - 2
S3 mixer doesn't start
#143 opened - 1
- 2
Possible bug in `local_shuffle`?
#139 opened - 1
Some race condition in url taggers
#138 opened - 1
- 1
- 0
Support providing streams into mixer via CLI
#130 opened - 1
- 1
- 3
make_wikipedia in getting_started.md
#125 opened - 1
- 3
make_wikipedia.py: long running time
#121 opened - 0
Only the attributes written by the last tagger in the tagger list gets written in version 1.0.0
#113 opened - 1
- 3
Tokenizer name or path must be found error
#110 opened - 1
Provenance license?
#108 opened - 1
Data sheet link in README is broken
#106 opened - 5
deduplication examples does not work
#96 opened - 0
- 0
The Law School Admission Council | LSAC
#89 opened - 0
- 0
AllenAI
#87 opened - 0
Hells Angels infinite loop
#85 opened - 0
- 0
- 0
- 0
911
#81 opened - 0
$open Allen Wolf (infinite_loop)
#80 opened - 0
Git - gitk Documentation
#79 opened - 3
Latest version is not on PyPi
#78 opened - 0
- 1
- 1
Terminal20141030.zip - Google Drive
#75 opened - 0
- 10
make_wikipedia.py fails on linux
#58 opened - 0
make_wikipedia.py hardcoded to simple
#57 opened - 0
- 4
- 1
Jessie ©
#48 opened - 1
Titles
#47 opened - 1
Ruby
#46 opened - 1
Adam Burden, I love you!
#45 opened - 1
Adam Burden
#44 opened