gpertea/stringtie

StringTie is unable to predict known transcripts.

unique379r opened this issue · 0 comments

Hi
I am trying to run StringTie for hifi isoseq reads (ccs fastq reads) by first running the deSALT aligner with '--trans-strand' option and then provide generated sort bam to StringTie (v2.2.1). The commands are as follow:

deSALT aln -o sample_desalt.sam -T -t 4 -x ccs hg38deSALT_index/ sample.hifi_reads.fastq

stringtie -p 4 -L -A sample_genes.txt -o sample_stringtie_counts.gtf -G gencode.v34.basic.annotation.gtf sample.desalt.sorted.bam

The gtf output generated by StringTie does not seem have known ENSEMB ID on second column, it has only "StringTie" which i suppose Novel predictions by StringTie. Can you please tell me whats wrong with it ?

## Counting the known Transcripts
grep -v '#' sample_stringtie_Out.gtf | awk '$2!="StringTie"' | wc -l

0

## OUT GTF

# StringTie version 2.2.1
chr1	StringTie	transcript	966476	975348	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; cov "2.624335"; FPKM "0.282870"; TPM "1.248060";
chr1	StringTie	exon	966476	966614	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "1"; cov "2.827338";
chr1	StringTie	exon	966704	966803	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "2"; cov "3.000000";
chr1	StringTie	exon	970277	970423	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "3"; cov "3.000000";
chr1	StringTie	exon	970521	970601	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "4"; cov "3.000000";
chr1	StringTie	exon	970686	971006	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "5"; cov "2.632399";
chr1	StringTie	exon	971077	971208	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "6"; cov "2.727273";
chr1	StringTie	exon	971324	971404	1000	-	.	gene_id "STRG.1"; transcript_id "STRG.1.1"; exon_number "7"; cov "3.000000";

Note: By the way, the gene table generated by stringTie does have the known and novel genes.

Gene ID	Gene Name	Reference	Strand	Start	End	Coverage	FPKM	TPM
ENSG00000187961.14	KLHL17	chr1	+	960584	965719	1.809155	0.195004	0.860383
ENSG00000187583.11	PLEKHN1	chr1	+	966482	975865	0.854442	0.092098	0.406349
ENSG00000187642.9	PERM1	chr1	-	975204	982093	0.042559	0.004587	0.020240
STRG.1	-	chr1	-	966476	975348	3.128678	0.337232	1.487912
ENSG00000187608.10	ISG15	chr1	+	1001138	1014540	196.094193	21.136450	93.256905
ENSG00000231702.2	AL645608.3	chr1	-	1008076	1008229	0.000000	0.000000	0.000000
ENSG00000224969.1	AL645608.1	chr1	-	1011997	1013193	0.000000	0.000000	0.000000
STRG.2	-	chr1	-	1012952	1014540	216.082062	40.503719	178.707962
ENSG00000187634.12	SAMD11	chr1	+	925731	944581	22.574600	2.433254	10.735847

-best
Rupesh Kesharwani