NGSEP/NGSEPcore

error sequence dictionary and index

danessel opened this issue · 1 comments

With the latest version I'm getting this error although I couldn't find any didderence in number of contigs ? An older version NGSEPcore_4.0.1.jar works on the same data

Oct 13, 2021 8:24:53 AM ngsep.main.OptionValuesDecoder loadGenomeWithLowerCase
INFO: Loading genome from: ../reference/Leek_44_CPMT.fa
Oct 13, 2021 8:42:01 AM ngsep.main.OptionValuesDecoder loadGenomeWithLowerCase
INFO: Loaded genome with: 59337 sequences. Total length: 38889402655 from file: ../reference/Leek_44_CPMT.fa
Oct 13, 2021 8:42:01 AM ngsep.discovery.MultisampleVariantsDetector logParameters
INFO: Input files: [2021-01.bam2]
Oct 13, 2021 8:42:01 AM ngsep.discovery.MultisampleVariantsDetector logParameters
INFO: Loaded reference genome from: ../reference/Leek_44_CPMT.fa
Output file: Leek.raw.NGSP.vcf
Ignore variants in lower case reference positions: false
Maximum number of alignments starting at the same position: 5
Minimum mapping quality to consider an alignment unique: 2
Process non unique primary alignments: false
Process secondary alignments: false
Base pairs to ignore from the 5' end of each read: 0
Base pairs to ignore from the 3' end of each read: 0
Prior heterozygosity rate: 0.001
Maximum base quality score (PHRED): 100
Minimum variant quality score (PHRED): 1
Call SNVs within STRs: false
Normal ploidy: 4
Print header with sample ploidy in the vcf file: false

Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at ngsep.NGSEPcore.main(NGSEPcore.java:66)
Caused by: htsjdk.samtools.SAMException: Sequence dictionary and index contain different numbers of contigs
at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.sanityCheckDictionaryAgainstIndex(AbstractIndexedFastaSequenceFile.java:107)
at htsjdk.samtools.reference.AbstractIndexedFastaSequenceFile.(AbstractIndexedFastaSequenceFile.java:68)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.(IndexedFastaSequenceFile.java:80)
at htsjdk.samtools.reference.IndexedFastaSequenceFile.(IndexedFastaSequenceFile.java:98)
at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:139)
at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:122)
at htsjdk.samtools.reference.ReferenceSequenceFileFactory.getReferenceSequenceFile(ReferenceSequenceFileFactory.java:111)
at htsjdk.samtools.cram.ref.ReferenceSource.(ReferenceSource.java:65)
at htsjdk.samtools.cram.ref.ReferenceSource.(ReferenceSource.java:61)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.referenceSequence(SamReaderFactory.java:259)
at ngsep.alignments.io.ReadAlignmentFileReader.init(ReadAlignmentFileReader.java:164)

Hi

Thanks for your interest in NGSEP. From version 4.0.1 to 4.0.2 (and beyond) we updated the version of the hts-jdk, which is the base library that we use to read and write BAM files. The new version seems to be validating headers of the bam file that the old version was not validating. More important than that, it could be that your bam file was not generated with the same reference used to run variants detection. On one side, please run samtools faidx on the reference genome. On the other side, you can run samtools view -H on the bam file and check if the squence dictionary headers coincide with the first two columns of the fai generated with samtools faidx.

Let me know how things go.