ciprian-chelba/1-billion-word-language-modeling-benchmark
Formerly known as code.google.com/p/1-billion-word-language-modeling-benchmark
PerlApache-2.0
Issues
- 2
Data Sources for the Corpus
#7 opened by nadyadtm - 1
if the word not in vocab, what should I do? or it always can't happen because the FullTokenizer
#6 opened by wangwang110 - 10
Some Training Data Duplicated in Heldout Data
#5 opened by bjascob - 4
Dev / Test set?
#4 opened by eric-haibin-lin - 1
Dead code.google.com link
#3 opened by charlesreid1 - 2
meaning of cost values in output.tar
#2 opened by jowagner - 2
question on the corpus size / script
#1 opened by vince62s