PICARD termination ,

Question

PICARD termination ,

Soumyadutta-basak opened this issue 2 years ago · 2 comments

java -jar /Bioinfo/picard-2.21.4/picard.jar MarkDuplicates INPUT=Merged_BW-1.bam OUTPUT=Merged_BW-1_mdup.bam METRICS_FILE=BW-1.txt REMOVE_DUPLICATES=false
This is command i am using for marking the duplicates and the error is getting us

********** NOTE: Picard's command line syntax is changing.

********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)

********** The command line looks like this in the new syntax:

********** MarkDuplicates -INPUT Merged_BW-1.bam -OUTPUT Merged_BW-1_mdup.bam -METRICS_FILE BW-1.txt -REMOVE_DUPLICATES false

18:22:39.974 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/Bioinfo/picard-2.21.4/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat May 27 18:22:39 IST 2023] MarkDuplicates INPUT=[Merged_BW-1.bam] OUTPUT=Merged_BW-1_mdup.bam METRICS_FILE=BW-1.txt REMOVE_DUPLICATES=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 TAG_DUPLICATE_SET_MEMBERS=false REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag CLEAR_DT=true DUPLEX_UMI=false ADD_PG_TAG_TO_READS=true ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 MAX_OPTICAL_DUPLICATE_SET_SIZE=300000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat May 27 18:22:39 IST 2023] Executing as sandeeplab1@bioinfo on Linux 5.19.0-32-generic amd64; OpenJDK 64-Bit Server VM 11.0.15-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.4-SNAPSHOT
INFO 2023-05-27 18:22:40 MarkDuplicates Start of doWork freeMemory: 149162632; totalMemory: 155189248; maxMemory: 32178700288
INFO 2023-05-27 18:22:40 MarkDuplicates Reading input file and constructing read end information.
INFO 2023-05-27 18:22:40 MarkDuplicates Will retain up to 116589493 data points before spilling to disk.
[Sat May 27 18:22:41 IST 2023] picard.sam.markduplicates.MarkDuplicates done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=1895825408
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 1: RGNB552198:15:HNHMHBGXH:4:21508:8996:12908
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:133)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap.remove(DiskBasedReadEndsForMarkDuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:559)
at picard.sam.markduplicates.MarkDuplicates.doWork(MarkDuplicates.java:257)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

Can anybody help with this ?

Answer 1 · 2023-06-13T17:16:14.000Z

@Soumyadutta-basak It looks like you have two reads both marked first-in-pair and primary reads with the same read name, which is not valid. Try running ValidateSamFile to test your file.

Answer 2 · 2023-09-12T17:56:28.000Z

Closing this issue for now. Feel free to reopen if there are any updates.