why miss so many circRNA to putout ?

Question

why miss so many circRNA to putout ?

gnilihzeux opened this issue 4 years ago · 8 comments

Dear author,
I had input a bed containing 135 circRNAs from DCC, but got only 8 in final gtf, which were all in Chr1.
In addition, there were 127 circRNAs in {sample_nm}_index.fa but only 47 in {sample_nm}_denovo.sorted.bam.

So, why there were so many gaps?

Thanks a lot.

Answer 1 · 2020-11-16T05:36:14.000Z

That's weird, could you try running CIRIquant without --bed option, which will use embedded version of CIRI2 for circRNA prediction?

Answer 2 · 2020-11-16T07:11:39.000Z

OK, I'll try & return later.

Answer 3 · 2020-11-17T01:41:16.000Z

Hi, I got a same results with '--tool & --circ'.

My comands are list below.

By the way, '--circ' should be a bed-4 format but not mentioned in the manual, which you should have a update.

my yml

name: hg19
tools:
  bwa: /root/miniconda2/bin/bwa
  hisat2: /root/miniconda2/bin/hisat2
  stringtie: /root/miniconda2/bin/stringtie
  samtools: /root/miniconda2/bin/samtools
  
reference:
  fasta: /root/Database/ref_genome/hg19/gencode_grch37p13.fa
  gtf: /root/Database/ref_genome/hg19/gencode_grch37p13.gtf
  bwa_index: /root/Database/ref_genome/hg19/bwa_index/gencode_grch37p13.fa
  hisat_index: /root/Database/ref_genome/hg19/hisat2_index/gencode_grch37p13

1st run

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 8 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --bed ${CIRIQ_DIR}/circ.bed \
            --log ${CIRIQ_DIR}/${sm_nm}.log

2nd run

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 16 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --circ ${CIRIQ_DIR}/circ.bed \
            --tool DCC \
            --bam ${CIRIQ_DIR}/align/${sm_nm}.sorted.bam \
            --log ${CIRIQ_DIR}/${sm_nm}.log

Answer 4 · 2020-11-17T02:57:24.000Z

Hi, could you run CIRIquant using the embedded CIRI2 rather than DCC for circRNA identification? For example:

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 16 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --log ${CIRIQ_DIR}/${sm_nm}.log

By the way, I noticed that you are using stranded library where read1 match the antisense strand of circRNAs. I've only tested CIRIquant on ScriptSeq data, which are using a different stranded protocol. You might want to run CIRIquant with -l 0 to check whether the strand determination of circRNAs is causing the problem.

Answer 5 · 2020-11-17T09:24:45.000Z

Ehhhhhh, there were only 3 circRNAs left using CIRIquant, which are still on Chr1.
But a few of circRNAs were found by CIRI2, which were 47 among multiple chromosomes.

3nd run

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 16 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --log ${CIRIQ_DIR}/${sm_nm}.log

My files tree

.
├── align
│   ├── Lung-1.sorted.bam
│   └── Lung-1.sorted.bam.bai
├── circ
│   ├── Lung-1.ciri
│   ├── Lung-1_denovo.sorted.bam
│   ├── Lung-1_denovo.sorted.bam.bai
│   ├── Lung-1_index.1.ht2
│   ├── Lung-1_index.2.ht2
│   ├── Lung-1_index.3.ht2
│   ├── Lung-1_index.4.ht2
│   ├── Lung-1_index.5.ht2
│   ├── Lung-1_index.6.ht2
│   ├── Lung-1_index.7.ht2
│   ├── Lung-1_index.8.ht2
│   └── Lung-1_index.fa
├── gene
│   ├── Lung-1_cov.gtf
│   ├── Lung-1_genes.list
│   └── Lung-1_out.gtf
├── Lung-1.bed
├── Lung-1.gtf
└── Lung-1.log

Answer 6 · 2020-11-17T10:41:35.000Z

Well, then I have to presume that your data is not suitable for circRNA analysis. Are you using stranded libraries that were constructed using oligo-dT primers? If so, most circRNAs will be filtered out, and it will explain the results.

P.S. You might also want to check the expression levels of circRNAs reported by DCC. Some lowly expressed circRNAs might be reverse transcription artifacts.

Answer 7 · 2020-11-18T00:23:03.000Z

Well, I'll try other samples.

Answer 8 · 2020-11-18T06:48:14.000Z

I've test two samples, one with RNaseR treated & another is just in regular RNA-seq,
in which circRNAs are among different chromosomes and some has considerable reads qualificated by DCC.

But CIRIquant returns only a few circrnas on Chr1 for both samples.

I think there are some thresholds in your programs, such as

circRNA sequences to build a reference
in which, the number of my input is not consistent with the ones in reference fasta
${sample}_denovo.sorted.bam containing a fewer circRNAs than that in reference
But I found there were unmapped reads actually, so it could not be explained by only mapped reads reverved .
the final output circRNAs which is only on Chr1

By the way, we could have a communication further using wechat if it is necessary.