SpeciesPrimer

New in SpeciesPrimer v2.1

Configfile option for pipeline setup (v2.1.1)
Custom Blast DB support
Email option for command line
Increased speed
Species synonyms are added to exceptions
Bugfixes and KeyboardInterrupt rollback
Simpler directory structure

Docs

Minimum system requirements

Quad core processor
16 GB RAM
SSD / fast hard disk (recommended)
60 GB free space for nt database
4.5 GB for the docker image
5 - 20 GB for each analysis

quick start (Ubuntu 16.04)

  $ sudo docker pull biologger/speciesprimer
  $ mkdir $HOME/primerdesign
  $ mkdir $HOME/blastdb
  $ sudo docker run \
  -v $HOME/blastdb:/blastdb \
  -v $HOME/primerdesign:/primerdesign \
  -p 5000:5000 -p 9001:9001 \
  --name speciesprimer biologger/speciesprimer

Open the address http://localhost:5000 or http://127.0.0.1:5000 in your favorite webbrowser
Enter your E-mail address (required for the biopython NCBI Entrez module)
Download the nt BLAST DB (>60 GB) or the ref_prok_rep_genomes DB (~6.5 GB). BLAST DB
Customize the species list and other parameters if required. SpeciesPrimer settings
Navigate to Primer design and start primer design for new targets. Primer design
If you want to use the ref_prok_rep_genomes DB provide the path (/blastdb/ref_prok_rep_genomes) in the customdb settings field
The results can be found in the Summary directory e.g. /primerdesign/Summary (container) or $HOME/primerdesign/Summary (host)

Use the pipeline with the command line

After the docker run command open a new terminal

  # open an interactive terminal in the docker container
  $ sudo docker exec -it speciesprimer bash

Download the nt BLAST DB (>60 GB):

  $ getblastdb.py -dbpath /blastdb --delete

or download the ref_prok_rep_genomes DB (~6.5 GB):

  $ getblastdb.py -db ref_prok_rep_genomes -dbpath /blastdb --delete

or alternatively

  $ cd /blastdb

  $ update_blastdb.pl --passive --decompress nt
  # or
  $ update_blastdb.pl --passive --decompress ref_prok_rep_genomes

  $ cd /primerdesign

Customize the species list and other parameters if required (see docs/pipelinesetup.md for more info):

  $ nano /pipeline/dictionaries/species_list.txt
  $ nano /pipeline/dictionaries/p3parameters
  $ nano /pipeline/dictionaries/no_blast.gi

Start primer design
```
  $ speciesprimer.py
```
Starting the script will start an assistant for the configuration of a new run

For more information and advanced settings see Advanced command line usage

If you want to use the ref_prok_rep_genomes DB use the customdb option with the path

	/blastdb/ref_prok_rep_genomes

Introduction

The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems. The pipeline uses genome assemblies of the target species to identify core genes (genes which are present in all assemblies) and checks the specificity for the target species using BLAST. Primer design is performed by primer3, followed by a stringent primer quality control. To make the evaluation of primer specificity faster and simpler, not all sequences of all bacterial species in the BLAST database are considered, the user has to provide a list of organisms which are expected to be present in the investigated ecosystem and should not be detected by the primer pair. The output of the pipeline is a comma separated file with possible primer pairs for the target species, which can be further tested and evaluated by the user.

Pipeline workflow and tools

Pipeline workflow	Tools	Reference

Input genome assemblies

- download	NCBI Entrez (Biopython)	Cock et al. 2009; Sayers 2009
- annotation	Prokka	Seemann 2014
- quality control	BLAST+	Altschul et al. 1990

Core gene sequences

- identification	Roary	Page et al. 2015
- phylogeny	FastTree 2	Price et al. 2010
- selection of conserved sequences	Prank, consambig (EMBOSS),GNU parallel	Löytynoja 2014; Rice et al. 2000; Tange 2011
- evaluation of specificity	BLAST+	Altschul et al. 1990


Primer

- design	Primer3	Untergasser et al. 2012
- quality control	BLAST+, Mfold, MFEPrimer 2.0, MPprimer	Altschul et al. 1990; Zuker et al. 1999; Qu et al. 2012; Shen et al. 2010

The DBGenerator.py script from Microbial Genomics Lab at CBIB and SQlite3 was used in an earlier version to create an SQL database from the Roary output.

Python modules and software used for the GUI:

Run settings

Section	Command line option [Input]	Description	Default
General	target [str]	Name of the target species	None (required)
	exception [str]	Name of a non-target bacterial species for which primer binding is tolerated	None
	path [str]	Absolute path of the working directory	Current working directory
	offline	Work offline with local genome assemblies	False
	skip_download	Skips download of genome assemblies from NCBI RefSeq FTP server	False
	assemblylevel [all, complete, chromosome, scaffold, contig]	Only genome assemblies with the selected assembly status will be downloaded from the NCBI RefSeq FTP server	['all']
	customdb [str]	Use the NCBI ref_prok_rep_genomes database or any other BLAST DB	None
	blastseqs [100, 500, 1000, 2000, 5000]	Set the number of sequences per BLAST search. Decreasing the number of sequences requires less memory	1000
	blastdbv5	Limits all BLAST searches to taxid:2 (bacteria). Works only with version 5 BLAST databases. May increase speed.	False
	email [str]	Provide your email in the command line to access NCBI. No input required during the run.	None
	intermediate	Select this option to keep intermediate files.	False
	nolist	Do not use the (non-target) species list, only sequences without Blast hits are selected for primer design. May be used with a custom Blast DB	False
	configfile [str]	Path to configuration file (json) to use custom species_list.txt, p3parameters, genus_abbrev.csv and no_blast.gi files	None
Quality control	qc_gene [rRNA, recA, dnaK, pheS, tuf]	Selection of housekeeping genes for BLAST search to determine the species of input genome assemblies	['rRNA']
	ignore_qc	Keep genome assemblies, which fail to meet the criteria of the quality control step	False
Pan-genome analysis	skip_tree	Skips core gene alignment (Roary) and core gene phylogeny (FastTree)	False
Primer design	minsize [int]	Minimal accepted amplicon size of PCR primer pairs	70
	maxsize [int]	Maximal accepted amplicon size of PCR primer pairs	200
Primer quality control	mfold [float]	Set the deltaG threshold (max. deltaG) for the secondary structures at 60 °C in the PCR product, calculated by Mfold	-3.5
	mpprimer [float]	Set the deltaG threshold (max. deltaG) for the primer-primer 3’-end binding, calculated by MPprimer	-3.0
	mfethreshold [int]	Threshold for MFEprimer primer pair coverage (PPC) score. Higher values: select for better coverage for target and lower coverage for for non-target sequences (recommended range 80 - 100).	90

Citation

If you use this software please cite:

Dreier M, Berthoud H, Shani N, Wechsler D, Junier P. 2020.
SpeciesPrimer: a bioinformatics pipeline dedicated to the design
of qPCR primers for the quantification of bacterial species.
PeerJ 8:e8544 https://doi.org/10.7717/peerj.8544

nbenzakour/speciesprimer