ciprian-chelba/1-billion-word-language-modeling-benchmark

Formerly known as code.google.com/p/1-billion-word-language-modeling-benchmark

PerlApache-2.0

Issues

Data Sources for the Corpus
#7 opened 2 years ago by nadyadtm
2
if the word not in vocab， what should I do？ or it always can't happen because the FullTokenizer
#6 opened 6 years ago by wangwang110
1
Some Training Data Duplicated in Heldout Data
#5 opened 6 years ago by bjascob
10
Dev / Test set?
#4 opened 7 years ago by eric-haibin-lin
4
Dead code.google.com link
#3 opened 7 years ago by charlesreid1
1
meaning of cost values in output.tar
#2 opened 8 years ago by jowagner
2
question on the corpus size / script
#1 opened 8 years ago by vince62s
2