This is a Snakemake pipeline for analyzing unpaired fungal internal transcribed spacer (ITS) sequences
To install, we assume you already have installed Miniconda3
(https://docs.conda.io/en/latest/miniconda.html)
- Clone the repository:
git clone https://github.com/PennChopMicrobiomeProgram/PCMP_ITS_pipeline.git
- Create a conda environment and install the required packages:
cd PCMP_ITS_pipeline
conda create -n PCMP_ITS_pipeline --channel bioconda --channel conda-forge --channel defaults python=3.9
conda install --name PCMP_ITS_pipeline --file requirements.txt
/anaconda/envs/venv_name/bin/pip install brocc #brocc needs to be installed through your environment's pip
- The following software also need to be installed:
dnabc
(https://github.com/PennChopMicrobiomeProgram/dnabc)primertrim
(https://github.com/PennChopMicrobiomeProgram/primertrim)brocc
(https://github.com/kylebittinger/brocc)- To install (dnabc as example):
git clone https://github.com/PennChopMicrobiomeProgram/dnabc cd dnabc conda activate PCMP_ITS_pipeline pip install -e ./
To run the pipeline, we need
- De/Multiplexed Illumina reads
- Create a project directory, e.g.
/scr1/users/tuv/ITS_Run1
- Copy the files from this repository into that directory
- Edit
config.yml
so that it suits your project. In particular,- all: project_dir: Path to the project directory, e.g.
"/scr1/users/tuv/ITS_Run1"
- all: mux_dir: Directory containing multiplexed Illumina sequencing reads, which does not have to be in the project directory, e.g.
"/path/to/mux_files"
; if samples are already demultiplexed, just fill in demux_dir - all: demux_dir: Leave blank if want to demultiplex using this pipeline; otherwise, the directory containing demultiplexed R1/R2 read pairs, which does not have to be in the project directory
- all: threads: Number of threads to use
- all: mapping_file: Mapping file of samples with barcode information for demultiplexing
- all: forward_direction: TRUE/FALSE for using forward/reverse read for this pipeline
- demux: mismatch: Number of allowable basepair mismatches on barcode sequence for demultiplexing
- demux: revcomp: If
TRUE
, reverse complement barcode sequence before demultiplexing - trim: f_primer: Sequence of forward primer used for ITS PCR
- trim: r_primer: Sequence of reverse primer used for ITS PCR
- trim: mismatch: Number of allowable basepair mismatches on ITS PCR primers for trimming
- trim: min_length: Minimum length of match during the partial matching stage
- trim: align_id: Minimum percent identity to consider a primer match in vsearch alignment
- otu: expected_error: Threshold for truncating reads
- otu: otu_id: Percent sequence identity for clustering reads into OTUs
- otu: threads: Number of threads to use
- otu: chimera_db: Path to UCHIME reference dataset for chimera detection (see https://unite.ut.ee/repository.php); leave blank if using mock DNA amplified with chimera primers
- blastn: ncbi_db: Path to a local ncbi nt database
- all: project_dir: Path to the project directory, e.g.
- To run the pipeline, activate the environment by entering
conda activate PCMP_ITS_pipeline
,cd
into the project directory and execute:
snakemake \
--configfile path/to/config.yml \
--keep-going \
--latency-wait 90 \
--notemp
- When submitting jobs using slurm, you may run
sbatch run_snakemake.bash config.yml
- You can use the skeleton.Rmd to create a basic bioinformatic report from the results
create_local_taxonomy_db.py
may be used to install a local taxonomy db for faster processing
Input: Multiplexed Illumina sequencing files
Output: manifest.csv, total_read_counts.tsv, demultiplexed fastq files
Removes ITS forward and reverse primer sequences from reads
Output: reads/(reads.log, top_{rf}_seqs_trimmed.txt, {rf}_trimmed_removed_counts.txt)
Create OTUs from amplicons using vsearch. Singletons are discarded for creating the OTUs, but used for the counts.
Rules are based on this wiki: (https://github.com/torognes/vsearch/wiki/Alternative-VSEARCH-pipeline)
Output: otu/otu_sorted.tsv
Determine the taxonomic assignments of the OTUs by through a consensus based BLAST result (https://github.com/kylebittinger/brocc)
Output: BLAST_BROCC_output/out_brocc/brocc.log