/CIRCprimerXL

Primary LanguagePythonOtherNOASSERTION

CIRCprimerXL

  • Collaborators: Marieke Vromman, Pieter-Jan Volders. Questions concerning the GitHub structure/scripts can be addressed to any of the collaborators.
  • Primer design pipeline for circRNAs based on primerXL (Lefever, S., Pattyn, F., De Wilde, B. et al. High-throughput PCR assay design for targeted resequencing using primerXL. BMC Bioinformatics 18, 400 (2017). https://doi.org/10.1186/s12859-017-1809-3).
  • This pipeline runs entirly in the [oncornalab/primerxl_circ](https://hub.docker.com/repository/docker/oncornalab/primerxl_circ) docker image, which is available on DockerHub. It is not necessary to download this image locally, as Nextflow pulls the latest version automatically from DockerHub.
  • If you're a member of the OncoRNALab looking to edit the webtool version of this pipeline, please see the internal GitHub repo: https://github.ugent.be/vandesompelelab/primerXL_circ_web.

Installation

Required files and reference genomes

Two references are required to run the pipeline:

Both indexes below are examples and can be replaced by your choosen index (--index_fasta, --index_bowtie) and species (see below).

fastahack

Note: the fastahack index needs to be unzipped.

wget http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
fastahack -i Homo_sapiens.GRCh38.dna.primary_assembly.fa

Bowtie

In general, a combination of the cDNA and ncRNA index are used to test the specificity of the primers.

wget ftp://ftp.ensembl.org/pub/release-101/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz
gunzip Homo_sapiens.GRCh38.cdna.all.fa.gz
wget ftp://ftp.ensembl.org/pub/release-101/fasta/homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz
gunzip Homo_sapiens.GRCh38.ncrna.fa.gz
cat Homo_sapiens.GRCh38.cdna.all.fa Homo_sapiens.GRCh38.ncrna.fa > hg38_cdna.fa
rm Homo_sapiens.GRCh38.cdna.all.fa
rm Homo_sapiens.GRCh38.ncrna.fa
bowtie-build hg38_cdna.fa hg38_cdna

You do not need to install fastahack and Bowtie locally to create the required indexes. Instead you can use the Docker container. For this, Docker needs to be installed on your computer.

docker run -v "$PWD/assets":/assets oncornalab/primerxl_circ:v0.11 /bin/bowtie-1.3.0-linux-x86_64/bowtie-build /assets/index_bowtie/hg38_cdna.fa /assets/index_bowtie/hg38_cdna
docker run -v "$PWD/assets":/assets oncornalab/primerxl_circ:v0.11 /bin/fastahack-1.0.0/fastahack -i assets/index_fastahack/GRCh38_latest_genomic.fna

Running on your computer

Nextflow and Docker should be installed locally. Make sure Docker Desktop is running when you want run the pipeline.

Running on the HPC (UGent)

Nextflow version 20.10.0 is available on all clusters (swalot, skitty, victini, joltik, kirlia, doduo). The pipeline can be run through an interactive session. The pipeline can only run from the $VSC_SCRATCH_VO_USER directory.

qsub -I -l nodes=1:ppn=16 -l walltime=04:00:00
cd $VSC_SCRATCH_VO_USER/CIRCprimerXL/
module load Nextflow/20.10.0
nextflow run CIRCprimerXL.nf --help

General usage

$ nextflow run CIRCprimerXL.nf --help

Usage:
	
	The typical command for running the pipeline is as follows:
	nextflow run CIRCprimerXL.nf -profile singularity
	
	Mandatory nextflow arguments:
	-profile            set to 'local' when running locally, set to 'singularity' when running on the HPC

	Mandatory pipeline arguments:
	--input_bed         path to input file with circRNAs in bed format (0-based annotation)
	--index_bowtie      path to bowtie genome index directory
	--index_bowtie_name the basename of the Bowtie index to be searched (the name of any of the index files up to but not including the final .1.ebwt / .rev.1.ebwt / ...)
	--index_fasta       path to fastahack index
	--index_fasta_name  the name of the fastahack genome file



	Optinal pipeline arguments:
	--splice			when set to 'yes' the input sequence will be spliced, when set to 'no' the input sequence will be unspliced
	--primer_settings   path to file with primer3plus settings (see primer3 manual)
	--chrom_file        file containing all chromosome sizes (to validate bed file)
	--known_exons       bed file containing exon annotation
	--list_ENST         file containing ENST numbers of canonical transcripts or transcripts of interest (this file can also be left empty)
	--primer3_diff      the minimum number of base pairs between the 3' ends of any two left primers (see also primer3 PRIMER_MIN_LEFT_THREE_PRIME_DISTANCE)
	--primer3_nr        the number of primers designed by primer3; caution: setting this parameter to a large value will increase running time
	--min_tm	    minimum melt temperature of the primers (default: 58)
	--max_tm	    maximum melt temperature of the primers(default: 60)
	--opt_tm	    optimal melt temperature of the primers(default: 59)
	--diff_tm	    maximum difference in melt temperature between the primers(default: 2)
	--min_gc	    minimum GC contect of the primers (default: 30)
	--max_gc	    maximum GC contect of the primers(default: 80)
	--opt_gc	    optimal GC contect of the primers(default: 50)
	--amp_min	    minimum amplicon length (default: 60)
	--amp_max	    maximum amplicon length (default: 0)
	--temp_l            the number of nucleotides on each side of the circRNA BSJ that will be used for the template (example 150 => template of 300 nts in total)
	--spec_filter       when set to 'strict', only 2MM + 3MM are allowed; when set to 'loose', 2MM + 2MM and 2MM + 1MM are also allowed
	--snp_filter        when set to 'strict', no common SNPs are allowed in primer sequence; when set to 'loose', common SNPs are allowed in 5' first half of primer; when set to 'off', no filter is applied
	--snp_url           when using a differente species than human, the correct SNP database url should be provided; alternatively, this paramater can be set to 'off' if no SNP database is available
	--upfront_filter    when set to 'yes', SNPs and secundary structures are avoided before primer design; when set to 'str', secundary structures are avoided before primer design; when set to 'snp', snp are avoided before primer design; when set to 'no', no filtering before primer design is performed
	--output_dir        path to directory where the output files will be saved
	

This repository contains an example run with 3 circRNAs. For this, a small subset of the indexes are also present in the example folder. The example can be run by

nextflow run CIRCprimerXL.nf -profile example

This is equivalent to

nextflow run CIRCprimerXL.nf -profile local --output_dir example/output --input_bed example/input_circRNAs.bed --index_fasta example/index_fastahack --index_bowtie example/index_bowtie --index_bowtie_name = hg38_cdna_small

You can easily create your own profiles by modifying the nextflow.config file.

Nextflow keeps track of all the processes executed in your pipeline. If you want to restart the pipeline after a bug, add '-resume'. The execution of the processes that are not changed will be skipped and the cached result used instead.

Note: The pipeline results are cached by default in the directory $PWD/work. This folder can take of lot of disk space. If your are sure you won’t resume your pipeline execution, clean this folder periodically.

Note: If a circRNA is smaller than the requested template size, the template size is reduced to the circRNA size. Of note, if this 300-nucleotide template sequence includes an exon-intron boundary, the intronic region (which may not be part of the circRNA) is included. Some circRNAs effectively also include intronic sequences, and some BSJ concatenate an exonic and an intronic sequence.

Output

In the output folder, you will find

  • filtered_primers.txt, a file containing one selected primer pair per circRNA (see below for column details)
  • log_file.txt, a file
  • summary_run.txt,
  • all_primers directory
  • primer3_details directory

filtered_primers.txt output file column names:

column name description
circ_ID circ id assigned to each circRNA (unique within one run)
chr circRNA chromome
start circRNA start position
end circRNA end position
primer_ID primer ID generated by primer3
FWD_primer forward primer
REV_primer reverse primer
FWD_pos relative position of forward primer
FWD_length length of forward primer
REV_pos relative position of reverse primer
REV_length length of reverse primer
FWD_Tm melt temperature of forward primer
REV_Tm melt temperature of reverse primer
FWD_GC GC content of forward primer
REV_GC GC content of reverse primer
amplicon amplicon sequence amplified by the primer pair
PASS result of filtering (PASS if the primer pair passed all filters, FAIL if the primer pair failed one or more filters)
start_annotation exons used for the template sequence on the right side of the BSJ
end_annotation exons used for the template sequence on the left side of the BSJ
splicing spliced or unspliced template sequence was used

Other species

As default, CIRCprimerXL designs primers voor humans. To design primers for other species, the following files have to be provided and parsed through the corresponding parameters:

Nextflow tower

Nextflow tower can be used to monitor the pipeline while it's running.

nextflow run CIRCprimerXL.nf -with-tower

When Nextflow tower is used in combination with the HPC, the nextflow version and tower access token should be indicated.

export NXF_VER="20.10.0"
export TOWER_ACCESS_TOKEN=your_token_here

Cite

M. Vromman, J. Anckaert, J. Vandesompele, P.J. Volders, CIRCprimerXL: convenient and high-throughput PCR primer design for circular RNA quantification. Frontiers in Bioinformatics (2022), DOI:10.3389/fbinf.2022.834655