Snakemake for SNPs is a flexible and user-friendly SNPs analysis workflow.
Snakemake for SNPs can be applied to both model and non-model organisms. It supports mapping RNA-Seq raw reads to the reference genome (can be downloaded from public database or can be homemade by users) and it can do both Allele Specific Expression for SNPs and obtain Differential Expressed Genes (DEGs), which in turn can be cross between them. It requires basic python programming skill for use. If you're beginner at programming, just jump on the config file and adapt it to your experiments!
If you use our pipeline you need to cite us:
WARNING: adapt the citation to our link:
NOTE: This pipeline is created in Linux and other platforms may not work out accurately.
The usage of this workflow is described in the Snakemake Workflow Catalog.
Clone the repository:
#git clone https://github.com/AylaScientist/Snakemake_for_SNPs.git
Create the environment:
conda create -n pipeline python=3.7
Activate the environment:
conda activate pipeline
Install the packages including the bio tools:
pip install git+https://github.com/snakemake/snakemake
conda install -c bioconda snakemake-wrapper-utils
conda install -c bioconda trimmomatic=0.39
conda install -c bioconda fastqc=0.11.9
conda install -c bioconda star=2.7.10a
conda install -c bioconda htseq=0.11.3
conda install -c bioconda picard=2.26
conda install -c bioconda gatk4=4.2.5.0
conda install -c bioconda samtools=1.16
conda install -c bioconda bcftools=1.16
conda install -c bioconda vcftools=0.1.16
conda install -c bioconda htslib=1.16
conda install -c anaconda perl=5.26.2
conda install -c anaconda pandas
conda install -c anaconda scipy
conda install -c anaconda statsmodels
conda install -c anaconda seaborn
conda install -c conda-forge matplotlib
conda install -c conda-forge py-bgzip
Set the resources of the system in the file config.
gedit ~/Snakemake_for_SNPs/config/config_main.yaml
Now that the resources are adapted to your computer, run a dry run for the pipeline with the example data to build a dag of jobs
cd ~/Snakemake_for_SNPs/workflow/
snakemake -n
If this point doesn't work, please contact me: ayla.bcn@gmail.com
This is an example for 4 threads at 4GB.
snakemake --use-conda --cores 4
Customize the workflow based on your need in the next file:
./config/config_main.yaml
.
In this file you should also change the species and the different databases for gene/transcript/protein/GO_function/KEGG correct annotation and mining of the data
Modify the metafiles describing your data and the experiment:
config/Experimental_design.csv
config/Experimental_groups.csv
config/Sample_names.csv
config/Samples_MAE.csv
config/samples.csv
Please note that the column names on the file "Experimental_groups.csv" should be called "Group_1" and "Group_2" for applying the Chi-square test.
You need to chose two samples from different groups, preferably one sample from the control group and one sample from a treatment group. The SNPs from these samples will be used to construct the pseudogenomes. The codes of these two samples in the example are GF6 and KS4. In order to create the pseudogenomes of your experiment, these codes should be substituted in the next files, including the file name of the *colnames.csv
files:
config/Pseudogenome_codes.csv
config/tbGF6_colnames.csv
config/tbKS4_colnames.csv
Very important: ADD the genome or transcriptome of your species! Here we have the genome of the Nile Tilapia in the folder genome
in the root of the git.
The pipeline for SNPs has been evaluated on 4 datasets including 2 non-model organism (Nile and Mozambique tilapias). WARNING: Put here the link to the article