Git clone:
git clone https://github.com/liaoherui/WideVariant-DL.git
cd WideVariant-DL/snake_pipeline
Build the conda environment:
conda env create -n widevariant --file widevariant.yaml
or mamba env create -n widevariant --file widevariant.yaml
Activate the conda environment:
conda activate widevariant
Build other conda environments required by snakemake:
sh script/install_subenv.sh
Change the permission of the file:
chmod 777 slurm_status_script.py
This pipeline and toolkit is used to detect and analyze single nucleotide differences between closely related bacterial isolates.
-
Noteable features
- Avoids false-negative mutations due to low coverage; if a mutation is found in at least one isolate in a set, the evidence at that position will be investigated to make a best-guess call.
- Avoids false-positives mutations by facilitating visualization of raw data, across samples (whereas pileup formats must be investigated on a sample-by-sample basis) and changing of threshold to best fit your use case.
- Enables easy evolutionary analysis, including phylogenetic construction, nonsynonmous vs synonymous mutation counting, and parallel evolution
-
Inputs (to Snakemake cluster step):
- short-read sequencing data of closely related bacterial isolates
- an annotated reference genome
-
Outputs (of local analysis step):
- table of high-quality SNVs that differentiate isolates from each other
- parsimony tree of how the isolates are related to each other
The pipeline is split into two main components, as described below. A complete tutorial can be found at the bottom of this page.
The first portion of WideVariant aligns raw sequencing data from bacterial isolates to a reference genome, identifies candidate SNV positions, and creates useful data structure for supervised local data filtering. This step is implemented in a workflow management system called Snakemake and is executed on a SLURM cluster. More information is available here.
1.1 Update - 2024-09-22: A user-friendly Python script is now available to help users run the pipeline more easily. Instructions are provided below:
Make sure to configure your config.yaml
file and script/run_snakemake.slurm
before starting the steps below..
Step-1: run the python script:
python widevariant.py -i <input_sample_info_csv> -r <ref_dir> -o <output_dir>
Step-2: check the pipeline using "dry-run"
sh script/dry-run.sh
Step-3: submit your slurm job.
sbatch script/run_snakemake.slurm
The second portion of WideVariant filters candidate SNVs based on data arrays generated in the first portion and generates a high-quality SNV table and a parsimony tree. This step is implemented with a custom python script. More information can be found here.
Main WideVariant pipeline README
Previous iterations of this pipeline have been used to study: