how to use lang-8 and CoNLL data?
XiaoranJin opened this issue · 5 comments
XiaoranJin commented
Hi, I've got lang-8 and CoNLL dataset. Any clue about how to input the data into your training scripts?
Would really appreciate if you can show what the data structure is in "nlc-train.tar" and "nlc-valid.tar".
Thanks!
avati commented
Something like this:
$ tar tvf nlc-valid.tar
-rw-r--r-- avati/avati 312524 2016-04-11 14:17 valid.x.txt
-rw-r--r-- avati/avati 323750 2016-04-11 14:17 valid.y.txt
$ tar tvf nlc-train.tar
-rw-r--r-- avati/users 50842953 2016-05-19 10:44 train.x.txt
-rw-r--r-- avati/users 51878318 2016-05-19 10:45 train.y.txt
You can also look at nlc_data.py to see what the code checks for.
kbpranay commented
hey guys anyone got a link for lang 8 data for training? If yes please send me a download link. @XiaoranJin @avati
Deleted user commented
@XiaoranJin
Did you understand what the data structure is in "nlc-train.tar" and "nlc-valid.tar" ?
morusu commented
@rajism did you have the "nlc-train.tar" and "nlc-valid.tar" ? can u share me?