QTL-Snakemake-Workflow

https://img.shields.io/conda/dn/bioconda/snakemake.svg?label=Bioconda https://github.com/snakemake/snakemake/workflows/CI/badge.svg?branch=master

The QTL-Snakemake-Workflow consists of 3 main fuctions which helps to utilize the QTLseqr tool for Next Generation sequencing bulk segregant analysis from a VCF file.

Getting started

To run this package first we need to install the required libraries for the workflow which can be installed by running

conda env create --file envs/condaenv.yml

Here we can define any name we want instead of the QTL_Test. This command will make a conda environment to run this package.

Once we have created the environment we need to configure the config.yaml file with the required parameters so that it can take input for different rules. After this to run the workflow we can use the command

snakemake -pr

This command will run the workflow and generate the output in the QTL_Plots folder.

Work FLow Rules

The QTL workflow consists of three rules:
  1. VCF_Homozygous_Filtering
  2. QTL_VCF_to_Table_Parser
  3. QTL_Plotting
  1. VCF_Homozygous_Filtering

This rule takes any vcf file and filters the parent for selecting only the homozygous SNPS in the vcf file.

input:
    expand("{vcf_file_name}.vcf.gz", vcf_file_name= config["vcf_file"]),
output:
    expand("Homozygous_Filtered_VCF/Homozygous_{vcf_file_name}.vcf.gz", vcf_file_name=config["vcf_file"])
params:
    filter_data= config["vcffilter"]["filters"]
shell:
    "( bcftools view"
    "   {params.filter_data}"
    "   {input}"
    "   -O z"
    "   -o {output}"
    ")"

The input is a vcf file which needs to be defined in the config file under the vcf_file name and the output of this rule will be saved into the Homozygous_Filtered_VCF folder with the same vcf file name just a Homozygous title will be added in the start of the name so that we can distinguish between different files if the workflow is run for multiple vcf files. The filtering is done on the parents and the shell command takes input from the config file.

-i "GT[0]='0/0' && GT[1]='1/1' || GT[0]='1/1' && GT[1]='0/0'"

In the above script for filtering the parents which comes from the config file it is taking the parents and from our reference vcf file the parents are on GT[0] and GT[1], but it should be adjusted according to the given vcf file.

  1. QTL_VCF_to_Table_Parser

This rule is for converting a vcf file into a table format file which is required by the QTL tool if the VCF file is not generated from GATK. This rule runs a R scripts which parses the vcf file.

input:
    expand("Homozygous_Filtered_VCF/Homozygous_{vcf_file_name}.vcf.gz", vcf_file_name=config["vcf_file"]),
output:
    "QTL_VCF_to_Table/QTL_Table.csv",
script:
    "Scripts/QTL_Parser.R"

It takes an input a vcf file from the Homozygous_Filtered_VCF folder with a specified name as Homozygous_{vcf_file_name}.vcf.gz the vcf_file_name comes from the config file where the name of the vf file was given. The output is saved in QTL_VCF_to_Table folder The QTL_VCF_to_Table script takes input from the config file where the names of the High Bulk, Low Bulk are defined along with that it also takes the parameter Number_of_Chromosomes from the config file.

  1. QTL_Plotting

This rule runs the QTLseqr tool and generates the plots for Gprime Analysis and QTLseq Analysis. This rules runs an R script for generating the plots.

input:
   "QTL_VCF_to_Table/QTL_Table.csv"
output:
   "QTL_Plots/DP_Filtering Data.pdf",
   "QTL_Plots/REF Frequency for Filtering Data.pdf",
   "QTL_Plots/SNP Index for Filtering Data.pdf",
   "QTL_Plots/GPrime Distribution with Hampel Outlier Filter.pdf",
   "QTL_Plots/GPrime Distribution with deltaSNP Outlier Filter.pdf",
   "QTL_Plots/SNP Density Plot.pdf",
   "QTL_Plots/Delta SNP Index Plot with Intervals.pdf",
   "QTL_Plots/GPrime Value Plot.pdf"
script:
    "Scripts/QTL_Plotting.R"

It takes input the csv file developed by the QTL_VCF_to_Table_Parser along with the parameters defined within the config file for filtering the SNPs for better results and the output is saved into QTL_Plots folder.

.. toctree::
   :caption: Getting started
   :name: getting_started
   :hidden:
   :maxdepth: 1

   getting_started/installation
   tutorial/tutorial
   tutorial/short


.. toctree::
  :caption: Executing workflows
  :name: execution
  :hidden:
  :maxdepth: 1

  executing/cli
  executing/cluster-cloud
  executing/caching
  executing/interoperability

.. toctree::
    :caption: Defining workflows
    :name: snakefiles
    :hidden:
    :maxdepth: 1

    snakefiles/writing_snakefiles
    snakefiles/rules
    snakefiles/configuration
    snakefiles/modularization
    snakefiles/remote_files
    snakefiles/utils
    snakefiles/deployment
    snakefiles/reporting


.. toctree::
    :caption: API Reference
    :name: api-reference
    :hidden:
    :maxdepth: 1

    api_reference/snakemake
    api_reference/snakemake_utils
    api_reference/internal/modules


.. toctree::
    :caption: Project Info
    :name: project-info
    :hidden:
    :maxdepth: 1

    project_info/citations
    project_info/more_resources
    project_info/faq
    project_info/contributing
    project_info/authors
    project_info/history
    project_info/license