/speciesprimer

The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems.

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

SpeciesPrimer

License: GPL v3 Build Status Docker Cloud Build Status codecov CodeFactor Publish

New in SpeciesPrimer v2.1

  • Configfile option for pipeline setup (v2.1.1)
  • Custom Blast DB support
  • Email option for command line
  • Increased speed
  • Species synonyms are added to exceptions
  • Bugfixes and KeyboardInterrupt rollback
  • Simpler directory structure

Contents

Docs

Minimum system requirements

  • Quad core processor
  • 16 GB RAM
  • SSD / fast hard disk (recommended)
  • 60 GB free space for nt database
  • 4.5 GB for the docker image
  • 5 - 20 GB for each analysis

quick start (Ubuntu 16.04)

  • Download and install docker

      $ sudo docker pull biologger/speciesprimer
      $ mkdir $HOME/primerdesign
      $ mkdir $HOME/blastdb
      $ sudo docker run \
      -v $HOME/blastdb:/blastdb \
      -v $HOME/primerdesign:/primerdesign \
      -p 5000:5000 -p 9001:9001 \
      --name speciesprimer biologger/speciesprimer
    
  • Open the address http://localhost:5000 or http://127.0.0.1:5000 in your favorite webbrowser

  • Enter your E-mail address (required for the biopython NCBI Entrez module)

  • Download the nt BLAST DB (>60 GB) or the ref_prok_rep_genomes DB (~6.5 GB). BLAST DB

  • Customize the species list and other parameters if required. SpeciesPrimer settings

  • Navigate to Primer design and start primer design for new targets. Primer design

  • If you want to use the ref_prok_rep_genomes DB provide the path (/blastdb/ref_prok_rep_genomes) in the customdb settings field

  • The results can be found in the Summary directory e.g. /primerdesign/Summary (container) or $HOME/primerdesign/Summary (host)

Use the pipeline with the command line

  • After the docker run command open a new terminal

      # open an interactive terminal in the docker container
      $ sudo docker exec -it speciesprimer bash
    
  • Download the nt BLAST DB (>60 GB):

      $ getblastdb.py -dbpath /blastdb --delete
    
  • or download the ref_prok_rep_genomes DB (~6.5 GB):

      $ getblastdb.py -db ref_prok_rep_genomes -dbpath /blastdb --delete
    
  • or alternatively

      $ cd /blastdb
    
      $ update_blastdb.pl --passive --decompress nt
      # or
      $ update_blastdb.pl --passive --decompress ref_prok_rep_genomes
    
      $ cd /primerdesign
    
  • Customize the species list and other parameters if required (see docs/pipelinesetup.md for more info):

      $ nano /pipeline/dictionaries/species_list.txt
      $ nano /pipeline/dictionaries/p3parameters
      $ nano /pipeline/dictionaries/no_blast.gi
    
  • Start primer design

      $ speciesprimer.py
    
  • Starting the script will start an assistant for the configuration of a new run

For more information and advanced settings see Advanced command line usage

If you want to use the ref_prok_rep_genomes DB use the customdb option with the path

	/blastdb/ref_prok_rep_genomes

Introduction

The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems. The pipeline uses genome assemblies of the target species to identify core genes (genes which are present in all assemblies) and checks the specificity for the target species using BLAST. Primer design is performed by primer3, followed by a stringent primer quality control. To make the evaluation of primer specificity faster and simpler, not all sequences of all bacterial species in the BLAST database are considered, the user has to provide a list of organisms which are expected to be present in the investigated ecosystem and should not be detected by the primer pair. The output of the pipeline is a comma separated file with possible primer pairs for the target species, which can be further tested and evaluated by the user.

Pipeline workflow and tools

Pipeline workflow Tools Reference
Input genome assemblies
- download NCBI Entrez (Biopython) Cock et al. 2009; Sayers 2009
- annotation Prokka Seemann 2014
- quality control BLAST+ Altschul et al. 1990
Core gene sequences
- identification Roary Page et al. 2015
- phylogeny FastTree 2 Price et al. 2010
- selection of conserved sequences Prank, consambig (EMBOSS),GNU parallel Löytynoja 2014; Rice et al. 2000; Tange 2011
- evaluation of specificity BLAST+ Altschul et al. 1990
Primer
- design Primer3 Untergasser et al. 2012
- quality control BLAST+, Mfold, MFEPrimer 2.0, MPprimer Altschul et al. 1990; Zuker et al. 1999; Qu et al. 2012; Shen et al. 2010

The DBGenerator.py script from Microbial Genomics Lab at CBIB and SQlite3 was used in an earlier version to create an SQL database from the Roary output.

Python modules and software used for the GUI:

flask

flask-wtf

gunicorn

MyDaemon

Run settings

Section Command line option [Input] Description Default
General target [str] Name of the target species None (required)
exception [str] Name of a non-target bacterial species for which primer binding is tolerated None
path [str] Absolute path of the working directory Current working directory
offline Work offline with local genome assemblies False
skip_download Skips download of genome assemblies from NCBI RefSeq FTP server False
assemblylevel [all, complete, chromosome, scaffold, contig] Only genome assemblies with the selected assembly status will be downloaded from the NCBI RefSeq FTP server ['all']
customdb [str] Use the NCBI ref_prok_rep_genomes database or any other BLAST DB None
blastseqs [100, 500, 1000, 2000, 5000] Set the number of sequences per BLAST search. Decreasing the number of sequences requires less memory 1000
blastdbv5 Limits all BLAST searches to taxid:2 (bacteria). Works only with version 5 BLAST databases. May increase speed. False
email [str] Provide your email in the command line to access NCBI. No input required during the run. None
intermediate Select this option to keep intermediate files. False
nolist Do not use the (non-target) species list, only sequences without Blast hits are selected for primer design. May be used with a custom Blast DB False
configfile [str] Path to configuration file (json) to use custom species_list.txt, p3parameters, genus_abbrev.csv and no_blast.gi files None
Quality control qc_gene [rRNA, recA, dnaK, pheS, tuf] Selection of housekeeping genes for BLAST search to determine the species of input genome assemblies ['rRNA']
ignore_qc Keep genome assemblies, which fail to meet the criteria of the quality control step False
Pan-genome analysis skip_tree Skips core gene alignment (Roary) and core gene phylogeny (FastTree) False
Primer design minsize [int] Minimal accepted amplicon size of PCR primer pairs 70
maxsize [int] Maximal accepted amplicon size of PCR primer pairs 200
Primer quality control mfold [float] Set the deltaG threshold (max. deltaG) for the secondary structures at 60 °C in the PCR product, calculated by Mfold -3.5
mpprimer [float] Set the deltaG threshold (max. deltaG) for the primer-primer 3’-end binding, calculated by MPprimer -3.0
mfethreshold [int] Threshold for MFEprimer primer pair coverage (PPC) score. Higher values: select for better coverage for target and lower coverage for for non-target sequences (recommended range 80 - 100). 90

Citation

If you use this software please cite:

Dreier M, Berthoud H, Shani N, Wechsler D, Junier P. 2020.
SpeciesPrimer: a bioinformatics pipeline dedicated to the design
of qPCR primers for the quantification of bacterial species.
PeerJ 8:e8544 https://doi.org/10.7717/peerj.8544