mozack/abra2

Option to prevent reads ending with complex indel

jakobmatthes opened this issue · 7 comments

When using manta downstream of ABRA2 realigned alignments, manta will abort if it encounters an alignment with a CIGAR string ending in a combination of D/I operations. As they are not planning to change this behavior (Illumina/manta#137), could you implement an option to soft-clip the complex indel at the end?

Example data:
in.sam.txt
out.sam.txt

Output generated with:

java -jar abra2-2.18.jar --in in.bam --out out.bam --threads 1 --mer 0.1 --mad 250 --ref GRCh37.fa

Read with ID 'NB501582:124:HLMWFBGX7:3:21604:17458:10789' will overlap/end with the complex indel (CIGAR: 73M4D2I).

We can consider this, however I'd like to see if we can get the Manta team to address this as these are valid reads and can potentially represent real variation. I've posted a followup on (Illumina/manta#137) , so let's see if they respond.

Is it possible that there might be a way to use the SOFT_CLIP parameter to work around this issue perchance? I'm not 100% sure what the details of each argument are but it sounds like it might be possible to reduce the complex events by lowering one of these values for -sc

No, that won't resolve this problem. I'll add a new option to prevent these alignments in an upcoming release.

Added option --no-edge-ci to prevent output of complex indels at read start or read end. Use of this option should enable Manta to run to completion on ABRA2 generated BAM files.

Please see release 2.21

Is there any post-processing mutation caller that will work without either removing the complex indels or removing the intronic regions in RNA-Seq reads? After running abra2, free bayes will not work because of positioning issues, strelka2 won't work because of reads ending with I, and GATK-haplotype caller won't work because of Ns in the middle of reads. I want to include complex indels and introns in the RNA-Seq reads but can't find a post-processing variant caller that will work. Any recommendations?

Hi @mozack, thank you for adding the --no-edge-ci flag. I have run abra2 (v2.24) using the --no-edge-ci flag., but the output still will not be accepted by manta (v1.6.0). The offending cigar strings are pasted below from the manta log:

[2021-10-07T02:52:15.381573Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:15.187436Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_001_chr1_0003]  A00341:56:HFGMLDMXX:2:1111:6479:24893/1 tid:pos:strand 1:38259305:+ cigar: 141M7D9I templSize: 251 mate_tid:pos:strand 1:38259406:-
[2021-10-07T02:52:15.601330Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:15.379315Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_015_chr15_0005]         A00341:56:HFGMLDMXX:1:1175:10673:1689/1 tid:pos:strand 15:58287293:+ cigar: 141M7D9I templSize: 186 mate_tid:pos:strand 15:58287400:-
[2021-10-07T02:52:16.471433Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:16.295840Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_004_chr4_0008]  A00341:56:HFGMLDMXX:1:2261:19162:7404/2 tid:pos:strand 4:100534200:+ cigar: 147M6D3I templSize: -156 mate_tid:pos:strand 4:100534197:-
[2021-10-07T02:52:17.241259Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:17.054403Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_001_chr1_0005]  A00341:56:HFGMLDMXX:2:1115:26811:3959/2 tid:pos:strand 1:65130229:+ cigar: 144M28D6I templSize: -173 mate_tid:pos:strand 1:65130228:-
[2021-10-07T02:52:17.810390Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:17.681530Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_006_chr6_0009]  A00341:56:HFGMLDMXX:1:1247:22932:1814/2 tid:pos:strand 6:110797260:+ cigar: 148M18D2I templSize: 280 mate_tid:pos:strand 6:110797376:-
[2021-10-07T02:52:21.187856Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:21.005234Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_003_chr3_0001]  A00341:56:HFGMLDMXX:1:2379:27679:15671/1 tid:pos:strand 3:16926391:+ cigar: 54M1I76M1D2M1D14M23D3I templSize: 177 mate_tid:pos:strand 3:16926413:-
[2021-10-07T02:52:23.311202Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:23.234154Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_002_chr2_0003]  A00341:56:HFGMLDMXX:1:2259:9444:11898/2 tid:pos:strand 2:44545592:- cigar: 147M27D3I templSize: -249 mate_tid:pos:strand 2:44545517:+
[2021-10-07T02:52:33.838354Z] [d9fd00357361] [1_1] [TaskManager] [ERROR] [2021-10-07T02:52:33.716002Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_005_chr5_0013]  A00341:56:HFGMLDMXX:2:2268:9616:1235/1 tid:pos:strand 5:153599309:- cigar: 86M3D19I45S templSize: -151 sa: chrX,106624329,-,87S63M,60,0; mate_tid:pos:strand 5:153599247:+
[2021-10-07T02:52:40.823829Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:16.295840Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_004_chr4_0008]       A00341:56:HFGMLDMXX:1:2261:19162:7404/2 tid:pos:strand 4:100534200:+ cigar: 147M6D3I templSize: -156 mate_tid:pos:strand 4:100534197:-
[2021-10-07T02:52:40.823945Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:15.187436Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_001_chr1_0003]       A00341:56:HFGMLDMXX:2:1111:6479:24893/1 tid:pos:strand 1:38259305:+ cigar: 141M7D9I templSize: 251 mate_tid:pos:strand 1:38259406:-
[2021-10-07T02:52:40.824035Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:23.234154Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_002_chr2_0003]       A00341:56:HFGMLDMXX:1:2259:9444:11898/2 tid:pos:strand 2:44545592:- cigar: 147M27D3I templSize: -249 mate_tid:pos:strand 2:44545517:+
[2021-10-07T02:52:40.824128Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:15.379315Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_015_chr15_0005]      A00341:56:HFGMLDMXX:1:1175:10673:1689/1 tid:pos:strand 15:58287293:+ cigar: 141M7D9I templSize: 186 mate_tid:pos:strand 15:58287400:-
[2021-10-07T02:52:40.824221Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:17.054403Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_001_chr1_0005]       A00341:56:HFGMLDMXX:2:1115:26811:3959/2 tid:pos:strand 1:65130229:+ cigar: 144M28D6I templSize: -173 mate_tid:pos:strand 1:65130228:-
[2021-10-07T02:52:40.824333Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:21.005234Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_003_chr3_0001]       A00341:56:HFGMLDMXX:1:2379:27679:15671/1 tid:pos:strand 3:16926391:+ cigar: 54M1I76M1D2M1D14M23D3I templSize: 177 mate_tid:pos:strand 3:16926413:-
[2021-10-07T02:52:40.824433Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:17.681530Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_006_chr6_0009]       A00341:56:HFGMLDMXX:1:1247:22932:1814/2 tid:pos:strand 6:110797260:+ cigar: 148M18D2I templSize: 280 mate_tid:pos:strand 6:110797376:-
[2021-10-07T02:52:40.824522Z] [d9fd00357361] [1_1] [WorkflowRunner] [ERROR] [2021-10-07T02:52:33.716002Z] [d9fd00357361] [1_1] [makeLocusGraph_chromId_005_chr5_0013]       A00341:56:HFGMLDMXX:2:2268:9616:1235/1 tid:pos:strand 5:153599309:- cigar: 86M3D19I45S templSize: -151 sa: chrX,106624329,-,87S63M,60,0; mate_tid:pos:strand 5:153599247:+