New in SpeciesPrimer v2.1
- Configfile option for pipeline setup (v2.1.1)
- Custom Blast DB support
- Email option for command line
- Increased speed
- Species synonyms are added to exceptions
- Bugfixes and KeyboardInterrupt rollback
- Simpler directory structure
- Pipeline and Docker tutorial
- Advanced command line usage
- Pipeline setup
- Primerdesign
- Troubleshooting
- Custom BLAST DB tutorial
- More troubleshooting (Docker)
- Docker and proxy settings
- Quad core processor
- 16 GB RAM
- SSD / fast hard disk (recommended)
- 60 GB free space for nt database
- 4.5 GB for the docker image
- 5 - 20 GB for each analysis
-
$ sudo docker pull biologger/speciesprimer $ mkdir $HOME/primerdesign $ mkdir $HOME/blastdb $ sudo docker run \ -v $HOME/blastdb:/blastdb \ -v $HOME/primerdesign:/primerdesign \ -p 5000:5000 -p 9001:9001 \ --name speciesprimer biologger/speciesprimer
-
Open the address http://localhost:5000 or http://127.0.0.1:5000 in your favorite webbrowser
-
Enter your E-mail address (required for the biopython NCBI Entrez module)
-
Download the nt BLAST DB (>60 GB) or the ref_prok_rep_genomes DB (~6.5 GB). BLAST DB
-
Customize the species list and other parameters if required. SpeciesPrimer settings
-
Navigate to Primer design and start primer design for new targets. Primer design
-
If you want to use the ref_prok_rep_genomes DB provide the path (/blastdb/ref_prok_rep_genomes) in the customdb settings field
-
The results can be found in the Summary directory e.g. /primerdesign/Summary (container) or $HOME/primerdesign/Summary (host)
-
After the docker run command open a new terminal
# open an interactive terminal in the docker container $ sudo docker exec -it speciesprimer bash
-
Download the nt BLAST DB (>60 GB):
$ getblastdb.py -dbpath /blastdb --delete
-
or download the ref_prok_rep_genomes DB (~6.5 GB):
$ getblastdb.py -db ref_prok_rep_genomes -dbpath /blastdb --delete
-
or alternatively
$ cd /blastdb $ update_blastdb.pl --passive --decompress nt # or $ update_blastdb.pl --passive --decompress ref_prok_rep_genomes $ cd /primerdesign
-
Customize the species list and other parameters if required (see docs/pipelinesetup.md for more info):
$ nano /pipeline/dictionaries/species_list.txt $ nano /pipeline/dictionaries/p3parameters $ nano /pipeline/dictionaries/no_blast.gi
-
Start primer design
$ speciesprimer.py
-
Starting the script will start an assistant for the configuration of a new run
For more information and advanced settings see Advanced command line usage
/blastdb/ref_prok_rep_genomes
The SpeciesPrimer pipeline is intended to help researchers finding specific primer pairs for the detection and quantification of bacterial species in complex ecosystems. The pipeline uses genome assemblies of the target species to identify core genes (genes which are present in all assemblies) and checks the specificity for the target species using BLAST. Primer design is performed by primer3, followed by a stringent primer quality control. To make the evaluation of primer specificity faster and simpler, not all sequences of all bacterial species in the BLAST database are considered, the user has to provide a list of organisms which are expected to be present in the investigated ecosystem and should not be detected by the primer pair. The output of the pipeline is a comma separated file with possible primer pairs for the target species, which can be further tested and evaluated by the user.
Pipeline workflow | Tools | Reference |
---|---|---|
Input genome assemblies | ||
- download | NCBI Entrez (Biopython) | Cock et al. 2009; Sayers 2009 |
- annotation | Prokka | Seemann 2014 |
- quality control | BLAST+ | Altschul et al. 1990 |
Core gene sequences | ||
- identification | Roary | Page et al. 2015 |
- phylogeny | FastTree 2 | Price et al. 2010 |
- selection of conserved sequences | Prank, consambig (EMBOSS),GNU parallel | Löytynoja 2014; Rice et al. 2000; Tange 2011 |
- evaluation of specificity | BLAST+ | Altschul et al. 1990 |
Primer | ||
- design | Primer3 | Untergasser et al. 2012 |
- quality control | BLAST+, Mfold, MFEPrimer 2.0, MPprimer | Altschul et al. 1990; Zuker et al. 1999; Qu et al. 2012; Shen et al. 2010 |
The DBGenerator.py script from Microbial Genomics Lab at CBIB and SQlite3 was used in an earlier version to create an SQL database from the Roary output.
Python modules and software used for the GUI:
Section | Command line option [Input] | Description | Default |
---|---|---|---|
General | target [str] | Name of the target species | None (required) |
exception [str] | Name of a non-target bacterial species for which primer binding is tolerated | None | |
path [str] | Absolute path of the working directory | Current working directory | |
offline | Work offline with local genome assemblies | False | |
skip_download | Skips download of genome assemblies from NCBI RefSeq FTP server | False | |
assemblylevel [all, complete, chromosome, scaffold, contig] | Only genome assemblies with the selected assembly status will be downloaded from the NCBI RefSeq FTP server | ['all'] | |
customdb [str] | Use the NCBI ref_prok_rep_genomes database or any other BLAST DB | None | |
blastseqs [100, 500, 1000, 2000, 5000] | Set the number of sequences per BLAST search. Decreasing the number of sequences requires less memory | 1000 | |
blastdbv5 | Limits all BLAST searches to taxid:2 (bacteria). Works only with version 5 BLAST databases. May increase speed. | False | |
email [str] | Provide your email in the command line to access NCBI. No input required during the run. | None | |
intermediate | Select this option to keep intermediate files. | False | |
nolist | Do not use the (non-target) species list, only sequences without Blast hits are selected for primer design. May be used with a custom Blast DB | False | |
configfile [str] | Path to configuration file (json) to use custom species_list.txt, p3parameters, genus_abbrev.csv and no_blast.gi files | None | |
Quality control | qc_gene [rRNA, recA, dnaK, pheS, tuf] | Selection of housekeeping genes for BLAST search to determine the species of input genome assemblies | ['rRNA'] |
ignore_qc | Keep genome assemblies, which fail to meet the criteria of the quality control step | False | |
Pan-genome analysis | skip_tree | Skips core gene alignment (Roary) and core gene phylogeny (FastTree) | False |
Primer design | minsize [int] | Minimal accepted amplicon size of PCR primer pairs | 70 |
maxsize [int] | Maximal accepted amplicon size of PCR primer pairs | 200 | |
Primer quality control | mfold [float] | Set the deltaG threshold (max. deltaG) for the secondary structures at 60 °C in the PCR product, calculated by Mfold | -3.5 |
mpprimer [float] | Set the deltaG threshold (max. deltaG) for the primer-primer 3’-end binding, calculated by MPprimer | -3.0 | |
mfethreshold [int] | Threshold for MFEprimer primer pair coverage (PPC) score. Higher values: select for better coverage for target and lower coverage for for non-target sequences (recommended range 80 - 100). | 90 |
If you use this software please cite:
Dreier M, Berthoud H, Shani N, Wechsler D, Junier P. 2020.
SpeciesPrimer: a bioinformatics pipeline dedicated to the design
of qPCR primers for the quantification of bacterial species.
PeerJ 8:e8544 https://doi.org/10.7717/peerj.8544