suhrig/arriba

total=ERROR: could not find sequence of contig 'NC_007605'

jdjdj0202 opened this issue · 2 comments

Hi,
I want to analyze fusion from "aligned.sortedbycoord.out.bam file".
I received file as this format, so this is a raw data to me.

I entered arriba_v2.4.0 directory and wrote following command lines:

./arriba -x /home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam
-g /home/ubuntu/arriba_v2.4.0/GENCODE38.gtf -a /home/ubuntu/arriba_v2.4.0/hg38.fa
-b /home/ubuntu/arriba_v2.4.0/database/blacklist_hg38_GRCh38_v2.4.0.tsv -k /home/ubuntu/arriba_v2.4.0/database/known_fusions_hg38_GRCh38_v2.4.0.tsv
-p /home/ubuntu/arriba_v2.4.0/database/protein_domains_hg38_GRCh38_v2.4.0.gff3
-o /home/ubuntu/ATL005_RNAfusions.tsv -O /home/ubuntu/ATL005_fusions.discarded.tsv

Error message was as follows:
Reading chimeric alignments from '/home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam' (total=ERROR: could not find sequence of contig 'NC_007605'

///
I also tried
./arriba -x /home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam
-g /home/ubuntu/arriba_v2.4.0/GENCODE38.gtf -a /home/ubuntu/arriba_v2.4.0/hg19.fa
-b /home/ubuntu/arriba_v2.4.0/database/blacklist_hg19_hs37d5_GRCh37_v2.4.0.tsv -k /home/ubuntu/arriba_v2.4.0/database/known_fusions_hg19_hs37d5_GRCh37_v2.4.0.tsv
-p /home/ubuntu/arriba_v2.4.0/database/protein_domains_hg19_hs37d5_GRCh37_v2.4.0.gff3
-o /home/ubuntu/ATL005_RNAfusions.tsv -O /home/ubuntu/ATL005_fusions.discarded.tsv

The same error message appeared:
[2023-08-30T07:11:42] Reading chimeric alignments from '/home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam' (total=ERROR: could not find sequence of contig 'NC_007605'

How can I solve this problem?
Please help me.
Thanks!!

I solve this problem.
./samtools view -h /home/ubuntu/RNA/ATL005.Aligned.sortedByCoord.out.bam | grep -v "NC_007605" | samtools view -bS - > /home/ubuntu/RNA/ATL005.Contig_out.Aligned.sortedByCoord.out.bam

Thanks.

suhrig commented

When you run Arriba, you should use the same assembly (FastA file) that was used to generate the BAM file. This way the contigs are consistent and the error is avoided.