Out of bounds error
diego-rt opened this issue · 1 comments
diego-rt commented
Hey there neighbors,
Thanks a lot for producing the only competent dot plotter out there!
I am getting the following error after running Gepard on a 1.08 Gbp sequence.
Loading substitution matrix...
Loading sequence from split.fasta
Loading sequence from split.fasta
Calculating suffix array...
Calculating dotplot...
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index -1072693259 out of bounds for length 1083253305
at org.gepard.common.SuffixArray.search(SuffixArray.java:84)
at org.gepard.common.DotMatrix.calcDotMatrix(DotMatrix.java:211)
at org.gepard.common.DotMatrix.<init>(DotMatrix.java:144)
at org.gepard.client.cmdline.CommandLine.main(CommandLine.java:310)
I assume this out of bounds issue is because of some buffer overflow math error somewhere? My understanding is that java should be able to handle integers up to 2G so I assume it's just some non-overflow-safe math?
Thanks a lot!
diego-rt commented
I've tried splitting the huge monolithic fasta sequence (i.e. 1.5 Gbp) into a multi sequence fasta where no sequence is larger than 1Gb (i.e. 1 Gbp + 0.5 Gbp) but that also hasn't worked. Perhaps enabling this is the best approach for dealing with genome-scale sequences?