DKFZ-ODCF/AlignmentAndQCWorkflows

roddy.sh run fails on existing output (WGBS)

Opened this issue · 0 comments

Consider the following example Roddy call:

roddy.sh run testCoBaseConfigs-Roddy-3-0.Picard.SoftwareBwa.WGBS@alignment methylationpid --usemetadatatable=wgbsTableTest1.tsv --useconfig=applicationProperties-workstation-to-lsf.ini --usefeaturetoggleconfig=featureToggles.ini --configurationDirectories=configs,AlignmentAndQCWorkflows/,AlignmentAndQCWorkflows/resources --usePluginVersion=AlignmentAndQCWorkflows:1.2.73 --useiodir=wgbsTableTest1.testCoBaseConfigs-Roddy-3-0.Picard.SoftwareBwa.WGBS.noInputDir --cvalues="INDEX_PREFIX:reference_genomes/bwa06_methylCtools_hs37d5_PhiX_Lambda/hs37d5_PhiX_Lambda.conv.fa,CHROM_SIZES_FILE:reference_genomes/bwa06_methylCtools_hs37d5_PhiX_Lambda/stats/hs37d5_PhiX_Lambda.fa.chrLenOnlyACGT.tab,CYTOSINE_POSITIONS_INDEX:reference_genomes/bwa06_methylCtools_hs37d5_PhiX_Lambda/hs37d5_PhiX_Lambda.pos.gz,CHROMOSOME_INDICES:(21 22),outputAllowAccessRightsModification:false,usedResourcesSize:xs,runFastQC:true,runFingerprinting:true,usedResourcesSize:m" --additionalImports=wgbs-standard-lsf --useRoddyVersion=3.0.9

When rerunning the WGBS analysis on existing data, the Picard merge/mark duplicates step dies with exit code 100 (99):

To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once.  1: run1_id2:HWI-xxxxxxx:104:XXXXXXXXX:1:1102:19656:66811
        at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:132)
        at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
        at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
        at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:285)
        at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:114)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:187)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:89)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:99)

This is probably because the script tries to keep an existing BAM file and instead merges in a new lane that actually is already in the BAM file (because this a simple "run").

The logic for handling BAMs involves 3 factors:

  1. The default BAM file (FILENAME in mergeAndMarkOrRemoveDuplicatesSlim) is present
  2. A specific BAM file is provided via the bam parameter.
  3. The useOnlyExistingTargetBam parameter is set to true or false (false by default).

TODO: Clean up the logic and fix the bug.