stanfordmlgroup/nlc

how to use lang-8 and CoNLL data?

XiaoranJin opened this issue · 5 comments

Hi, I've got lang-8 and CoNLL dataset. Any clue about how to input the data into your training scripts?

Would really appreciate if you can show what the data structure is in "nlc-train.tar" and "nlc-valid.tar".

Thanks!

avati commented

Something like this:

$ tar tvf nlc-valid.tar 
-rw-r--r-- avati/avati  312524 2016-04-11 14:17 valid.x.txt
-rw-r--r-- avati/avati  323750 2016-04-11 14:17 valid.y.txt

$ tar tvf nlc-train.tar 
-rw-r--r-- avati/users 50842953 2016-05-19 10:44 train.x.txt
-rw-r--r-- avati/users 51878318 2016-05-19 10:45 train.y.txt

You can also look at nlc_data.py to see what the code checks for.

hey guys anyone got a link for lang 8 data for training? If yes please send me a download link. @XiaoranJin @avati

@XiaoranJin
Did you understand what the data structure is in "nlc-train.tar" and "nlc-valid.tar" ?

@rajism did you have the "nlc-train.tar" and "nlc-valid.tar" ? can u share me?

@avati did you have the "nlc-train.tar" and "nlc-valid.tar" ? can u share me?