StringTie is unable to predict known transcripts.
unique379r opened this issue · 0 comments
unique379r commented
Hi
I am trying to run StringTie for hifi isoseq reads (ccs fastq reads) by first running the deSALT aligner with '--trans-strand' option and then provide generated sort bam to StringTie (v2.2.1). The commands are as follow:
deSALT aln -o sample_desalt.sam -T -t 4 -x ccs hg38deSALT_index/ sample.hifi_reads.fastq
stringtie -p 4 -L -A sample_genes.txt -o sample_stringtie_counts.gtf -G gencode.v34.basic.annotation.gtf sample.desalt.sorted.bam
The gtf output generated by StringTie does not seem have known ENSEMB ID on second column, it has only "StringTie" which i suppose Novel predictions by StringTie. Can you please tell me whats wrong with it ?
## Counting the known Transcripts
grep -v '#' sample_stringtie_Out.gtf | awk '$2!="StringTie"' | wc -l
0
## OUT GTF
# StringTie version 2.2.1
chr1 StringTie transcript 966476 975348 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "2.624335"; FPKM "0.282870"; TPM "1.248060";
chr1 StringTie exon 966476 966614 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "2.827338";
chr1 StringTie exon 966704 966803 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; cov "3.000000";
chr1 StringTie exon 970277 970423 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "3"; cov "3.000000";
chr1 StringTie exon 970521 970601 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "4"; cov "3.000000";
chr1 StringTie exon 970686 971006 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "5"; cov "2.632399";
chr1 StringTie exon 971077 971208 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "6"; cov "2.727273";
chr1 StringTie exon 971324 971404 1000 - . gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "7"; cov "3.000000";
Note: By the way, the gene table generated by stringTie does have the known and novel genes.
Gene ID Gene Name Reference Strand Start End Coverage FPKM TPM
ENSG00000187961.14 KLHL17 chr1 + 960584 965719 1.809155 0.195004 0.860383
ENSG00000187583.11 PLEKHN1 chr1 + 966482 975865 0.854442 0.092098 0.406349
ENSG00000187642.9 PERM1 chr1 - 975204 982093 0.042559 0.004587 0.020240
STRG.1 - chr1 - 966476 975348 3.128678 0.337232 1.487912
ENSG00000187608.10 ISG15 chr1 + 1001138 1014540 196.094193 21.136450 93.256905
ENSG00000231702.2 AL645608.3 chr1 - 1008076 1008229 0.000000 0.000000 0.000000
ENSG00000224969.1 AL645608.1 chr1 - 1011997 1013193 0.000000 0.000000 0.000000
STRG.2 - chr1 - 1012952 1014540 216.082062 40.503719 178.707962
ENSG00000187634.12 SAMD11 chr1 + 925731 944581 22.574600 2.433254 10.735847
-best
Rupesh Kesharwani