/CIRCpseudo

Pipeline for circRNA-derived pseudogenes

Primary LanguagePerlOtherNOASSERTION

CIRCpseudo

CIRCpseudo is a pipeline to map back-splicing junction sequences for circRNA-derived pseudogenes detection. Using this pipeline, you could previously detect candidate circRNA-derived pseudogenes in genome, after maual check you could do further characterizations for them.

A schematic flow shows the pipeline

pipeline Features

  • Not specific to certain species
  • Aim to circRNA-derived pseudogenes detection

Usage: perl CIRCpseudo.pl [options]

Required:
        --circ          CircRNA file (CIRCexplorer format file)
        --ref           Reference annotation file (refFlat format file)
        --genome        Reference genome file (Fasta format file)
        --bwaidx        Bwa index of reference genome
        --output        Output file
Optional:
        --mismatch      max mismaches between fusion sequences and genome, defalt 4
        --fusionlen     fusion lenth of back-splice exon-exon junctions defalt 40

*Please add the CIRCpseudo directory to your $PATH first.

Example

CIRCpseudo.pl -circ circRNA.bed -ref mm10_ref.txt -genome mm10.fa -bwaidx index/mm10.fa.idx -output mouse_pseudo.txt

Note

Field Description
geneName Name of gene
isoformName Name of isoform
chrom Reference sequence
strand + or - for strand
txStart Transcription start position
txEnd Transcription end position
cdsStart Coding region start
cdsEnd Coding region end
exonCount Number of exons
exonStarts Exon start positions
exonEnds Exon end positions
  • mm10.fa is genome sequence in FASTA format.

Results

You should get result file by --output. Output file will report Host gene location, Host gene name, Fusion sequence, Pseudogene location and Mismatches.

Requirements

Citation

Dong R, Zhang XO, Zhang Y, Ma XK, Chen LL and Yang L. CircRNA-derived pseudogenes. Cell Res, 2016

License

Copyright (C) 2016 YangLab. See the LICENSE file for license rights and limitations (MIT).