Parsing the corpora
Opened this issue · 5 comments
tannineo commented
Find the resource for the test data.
tannineo commented
tannineo commented
I renamed them in batch under OSX.
The SGML files are using *.sgm
.
DTD files are all using *.dtd
.
Text readmes are using *.txt
.
tannineo commented
The renamed corpora can be found here:
https://drive.google.com/open?id=1rNAPWn9o6cW8jKTN_fI2yDGM8qP4ZmVh
tannineo commented
The parsed JSON file is here:
https://drive.google.com/open?id=1NUmfQBGaNPJOs1tpb-7QWENc2Q0CYaai
tannineo commented
Don't use the parsed json...