COMBINE-lab/salmon

Error about Transcript * appears in the reference but did not appear in the BAM

Closed this issue · 0 comments

Hi, I hope you're well. Here is my question:
[Bulk mode] Error: Transcript * appears in the reference but did not appear in the BAM

I want to obtain the ONT data expression by alignment-based mode, The command:
singularity exec ${code_path}/singularity_images/salmon:1.10.3--h6dccd9a_2 salmon quant \ --ont -p 16 -t ${ref_trans_fa} -l U -a ${LR_bam} -o ${output_tmp1}

I changed a lot of transcripts.fa file, but it's still report "Transcript * appears in the reference but did not appear in the BAM".

  1. Firstly, I used the transcripts.fa provided by the NCBI - GCF_002263795.3_ARS-UCD2.0_genomic.fna

  2. Secondly, I used gffread to obtain the transcripts.fa, But "Error: no valid ID found for GFF record". So I converted the gtf file (version2.2) by shell command as you recommended. the command:
    `singularity exec /public/home/b20223040336/Workspace/long_read_rna/02code/singularity_images/gffread:0.12.7--hdcf5f25_4 gffread
    -w GCF_002263795.3_ARS-UCD2.0_transcripts.fa -g GCF_002263795.3_ARS-UCD2.0_genomic.fna -w GCF_002263795.3_ARS-UCD2.0_genomic.gtf

grep -P '\btranscript_id\s+"[^"]+"' GCF_002263795.3_ARS-UCD2.0_genomic.gtf > GCF_002263795.3_ARS-UCD2.0_genomic_fixed.gtf

singularity exec /public/home/b20223040336/Workspace/long_read_rna/02code/singularity_images/gffread:0.12.7--hdcf5f25_4 gffread
GCF_002263795.3_ARS-UCD2.0_genomic_fixed.gtf -g GCF_002263795.3_ARS-UCD2.0_genomic.fna -w GCF_002263795.3_ARS-UCD2.0_transcripts_gtf.fa

`

3.Finally, I used the gff3 files provided by NCBI to obtain the transcripts.fa, the command:
GCF_002263795.3_ARS-UCD2.0_genomic.gff -g GCF_002263795.3_ARS-UCD2.0_genomic.fna -w GCF_002263795.3_ARS-UCD2.0_transcripts_gff.fa