Inconsistent Handling of blank lines in input corpuses
DavidSorge opened this issue · 0 comments
I ran into a (minor) issue using this tool to work with newspaper archive data.
In constructing my corpus, the process of removing words that did not have corresponding word-vectors resulted in empty lines in my input corpus.
The DMM model worked on the corpus without a problem, suggesting that there is a working mechanism in the code for handling this situation.
However, when I attempted to run DMMinf using the resulting model, I received a fatal error:
Error: Index: 0, Size: 0
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657)
at java.util.ArrayList.get(ArrayList.java:433)
at models.LFDMM_Inf.sampleSingleInitialIteration(Unknown Source)
at models.LFDMM_Inf.inference(Unknown Source)
at LFTM.main(Unknown Source)
The obvious solution to my problem is to fix my corpus-producing code, and make sure I don't feed empty lines into the DMMinf model.
But I post the issue here in case a future user runs into the same issue, or in case you would like to fix a minor bug in your otherwise excellent tool.