why miss so many circRNA to putout ?
gnilihzeux opened this issue · 8 comments
Dear author,
I had input a bed containing 135 circRNAs from DCC, but got only 8 in final gtf, which were all in Chr1
.
In addition, there were 127 circRNAs in {sample_nm}
_index.fa but only 47 in {sample_nm}
_denovo.sorted.bam.
So, why there were so many gaps?
Thanks a lot.
That's weird, could you try running CIRIquant without --bed option, which will use embedded version of CIRI2 for circRNA prediction?
OK, I'll try & return later.
Hi, I got a same results with '--tool & --circ'.
My comands are list below.
By the way, '--circ' should be a bed-4 format but not mentioned in the manual, which you should have a update.
my yml
name: hg19
tools:
bwa: /root/miniconda2/bin/bwa
hisat2: /root/miniconda2/bin/hisat2
stringtie: /root/miniconda2/bin/stringtie
samtools: /root/miniconda2/bin/samtools
reference:
fasta: /root/Database/ref_genome/hg19/gencode_grch37p13.fa
gtf: /root/Database/ref_genome/hg19/gencode_grch37p13.gtf
bwa_index: /root/Database/ref_genome/hg19/bwa_index/gencode_grch37p13.fa
hisat_index: /root/Database/ref_genome/hg19/hisat2_index/gencode_grch37p13
1st run
docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
/root/miniconda2/bin/CIRIquant \
-t 8 \
-1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
-2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
--config ${PRJ_DIR}/ciriquant.hg19.yml \
-o ${CIRIQ_DIR} \
-p ${sm_nm} \
-l 2 \
--bed ${CIRIQ_DIR}/circ.bed \
--log ${CIRIQ_DIR}/${sm_nm}.log
2nd run
docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
/root/miniconda2/bin/CIRIquant \
-t 16 \
-1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
-2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
--config ${PRJ_DIR}/ciriquant.hg19.yml \
-o ${CIRIQ_DIR} \
-p ${sm_nm} \
-l 2 \
--circ ${CIRIQ_DIR}/circ.bed \
--tool DCC \
--bam ${CIRIQ_DIR}/align/${sm_nm}.sorted.bam \
--log ${CIRIQ_DIR}/${sm_nm}.log
Hi, could you run CIRIquant using the embedded CIRI2 rather than DCC for circRNA identification? For example:
docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
/root/miniconda2/bin/CIRIquant \
-t 16 \
-1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
-2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
--config ${PRJ_DIR}/ciriquant.hg19.yml \
-o ${CIRIQ_DIR} \
-p ${sm_nm} \
-l 2 \
--log ${CIRIQ_DIR}/${sm_nm}.log
By the way, I noticed that you are using stranded library where read1 match the antisense strand of circRNAs. I've only tested CIRIquant on ScriptSeq data, which are using a different stranded protocol. You might want to run CIRIquant with -l 0
to check whether the strand determination of circRNAs is causing the problem.
Ehhhhhh, there were only 3 circRNAs left using CIRIquant, which are still on Chr1.
But a few of circRNAs were found by CIRI2, which were 47 among multiple chromosomes.
3nd run
docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
/root/miniconda2/bin/CIRIquant \
-t 16 \
-1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
-2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
--config ${PRJ_DIR}/ciriquant.hg19.yml \
-o ${CIRIQ_DIR} \
-p ${sm_nm} \
-l 2 \
--log ${CIRIQ_DIR}/${sm_nm}.log
My files tree
.
├── align
│ ├── Lung-1.sorted.bam
│ └── Lung-1.sorted.bam.bai
├── circ
│ ├── Lung-1.ciri
│ ├── Lung-1_denovo.sorted.bam
│ ├── Lung-1_denovo.sorted.bam.bai
│ ├── Lung-1_index.1.ht2
│ ├── Lung-1_index.2.ht2
│ ├── Lung-1_index.3.ht2
│ ├── Lung-1_index.4.ht2
│ ├── Lung-1_index.5.ht2
│ ├── Lung-1_index.6.ht2
│ ├── Lung-1_index.7.ht2
│ ├── Lung-1_index.8.ht2
│ └── Lung-1_index.fa
├── gene
│ ├── Lung-1_cov.gtf
│ ├── Lung-1_genes.list
│ └── Lung-1_out.gtf
├── Lung-1.bed
├── Lung-1.gtf
└── Lung-1.log
Well, then I have to presume that your data is not suitable for circRNA analysis. Are you using stranded libraries that were constructed using oligo-dT primers? If so, most circRNAs will be filtered out, and it will explain the results.
P.S. You might also want to check the expression levels of circRNAs reported by DCC. Some lowly expressed circRNAs might be reverse transcription artifacts.
Well, I'll try other samples.
I've test two samples, one with RNaseR treated & another is just in regular RNA-seq,
in which circRNAs are among different chromosomes and some has considerable reads qualificated by DCC.
But CIRIquant returns only a few circrnas on Chr1 for both samples.
I think there are some thresholds in your programs, such as
- circRNA sequences to build a reference
in which, the number of my input is not consistent with the ones in reference fasta ${sample}_denovo.sorted.bam
containing a fewer circRNAs than that in reference
But I found there were unmapped reads actually, so it could not be explained by only mapped reads reverved .- the final output circRNAs which is only on Chr1
By the way, we could have a communication further using wechat if it is necessary.