Exception during conversion to coreference chains format
Closed this issue · 2 comments
First of all, thanks for this great contribution to the biomedical community.
After having set up the boot environment, I am unable to convert the corpus to the coreference chains format. I am getting the following exception. Please help me figure out where I am wrong.
boot coreference convert --conll-coref-ident -o coref_data_craft/
converting (:coreference) annotations to :conll-coref-ident ...
output directory: /home/ashim/CRAFT/coref_data_craft
java.lang.Thread.run Thread.java: 748
java.util.concurrent.ThreadPoolExecutor$Worker.run ThreadPoolExecutor.java: 617
java.util.concurrent.ThreadPoolExecutor.runWorker ThreadPoolExecutor.java: 1142
java.util.concurrent.FutureTask.run FutureTask.java: 266
...
clojure.core/binding-conveyor-fn/fn core.clj: 1938
boot.core/boot/fn core.clj: 1031
boot.core/run-tasks core.clj: 1021
boot.user$eval61$fn__62$fn__67$fn__68.invoke : 73
boot.user$eval325$fn__326$fn__331$fn__332.invoke : 305
clojure.core/doall core.clj: 3039
clojure.core/dorun core.clj: 3024
clojure.core/seq core.clj: 137
...
clojure.core/map/fn core.clj: 2646
boot.user$eval325$fn__326$fn__331$fn__332$fn__333.invoke : 317
edu.ucdenver.ccp.file.conversion.FileFormatConverter.convert FileFormatConverter.java: 112
edu.ucdenver.ccp.file.conversion.FileFormatConverter.convert FileFormatConverter.java: 77
edu.ucdenver.ccp.file.conversion.DocumentWriter.serialize DocumentWriter.java: 47
edu.ucdenver.ccp.file.conversion.conllcoref2012.CoNLLCoref2012DocumentWriter.serialize CoNLLCoref2012DocumentWriter.java: 147
edu.ucdenver.ccp.file.conversion.conllu.CoNLLUDocumentWriter.generateRecords CoNLLUDocumentWriter.java: 104
edu.ucdenver.ccp.file.conversion.conllu.CoNLLUDocumentWriter.groupTokensBySentence CoNLLUDocumentWriter.java: 179
java.lang.IllegalArgumentException: Cannot group tokens by sentence without any sentence annotations.
clojure.lang.ExceptionInfo: Cannot group tokens by sentence without any sentence annotations.
line: 515
Thanks in advance,
Ashim
Because the CoNLL Coref format requires tokens and sentences, the conversion process also requires tokens and sentences to be part of the input. The part-of-speech
annotations contain both token and sentence boundaries, so adding that to your command should fix things. Also, the output directory should be an absolute path. Try the following (replacing /path/to/
with the appropriate path on your system):
boot part-of-speech coreference convert --conll-coref-ident -o /path/to/coref_data_craft/
Let me know if you have further issues.
Best,
Bill
Thanks a lot. It works now.