univieCUBE/gepard

Out of bounds error

diego-rt opened this issue · 1 comments

Hey there neighbors,

Thanks a lot for producing the only competent dot plotter out there!

I am getting the following error after running Gepard on a 1.08 Gbp sequence.

Loading substitution matrix...
Loading sequence from split.fasta
Loading sequence from split.fasta
Calculating suffix array... 
Calculating dotplot... 
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index -1072693259 out of bounds for length 1083253305
	at org.gepard.common.SuffixArray.search(SuffixArray.java:84)
	at org.gepard.common.DotMatrix.calcDotMatrix(DotMatrix.java:211)
	at org.gepard.common.DotMatrix.<init>(DotMatrix.java:144)
	at org.gepard.client.cmdline.CommandLine.main(CommandLine.java:310)

I assume this out of bounds issue is because of some buffer overflow math error somewhere? My understanding is that java should be able to handle integers up to 2G so I assume it's just some non-overflow-safe math?

Thanks a lot!

I've tried splitting the huge monolithic fasta sequence (i.e. 1.5 Gbp) into a multi sequence fasta where no sequence is larger than 1Gb (i.e. 1 Gbp + 0.5 Gbp) but that also hasn't worked. Perhaps enabling this is the best approach for dealing with genome-scale sequences?