about language model

Question

dopc opened this issue 4 years ago · 4 comments

Hey, thanks for great work and sharing it.

Which corpus did you use?
As I see, it is 2-gram model. Is there a 3- or 4-gram model which you can share?

Looking forward for your answer.
Thanks

Answer 1 · 2020-08-20T20:22:42.000Z

Answer 2 · 2020-08-20T20:33:49.000Z

Thanks for your answer.
But I have asked for textual language model and its corpus.

For the language model, I used kenlm’ lmplz -o 2 < vocabulary > text.arpa build_binary text.arpa lm.binary

in this command,

vocabulary

and

text.arpa

or

lm.binary

many thanks.

Answer 3 · 2020-08-21T03:54:17.000Z

You can parse texts from the Internet. Minimum 10 thousand sentences

Answer 4 · 2020-08-21T07:02:43.000Z

okay, thanks so much!