/impute-Ag

Attempting to impute and phase low coverage sequencing data in Anopheles

Primary LanguagePythonMIT LicenseMIT

impute-Ag

Snakemake Build Status

Impute-Ag logo

This repo is for initial investigations into the use of imputation and phasing for low coverage sequecning data, in Anopheles gambiae, using Ag1000g and VObs data as haplotype reference panels, and the program GLIMPSE, which is from the authors of SHAPEIT and uses Richard Durbins PBWT to rapidly impute and phase SNP markers.

Authors

  • Sanjay Curtis Nagi (@sanjaynagi)

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).

Step 1: Obtain a copy of this workflow

  1. Create a new github repository using this workflow as a template.
  2. Clone the newly created repository to your local system, into the place where you want to perform the data analysis.

Step 2: Configure workflow

Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yaml to configure the workflow execution, and samples.tsv to specify your sample setup.

Step 3: Execute workflow

Activate the conda environment:

conda activate snakemake

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

or

snakemake --use-conda --drmaa --jobs 100

If you not only want to fix the software stack but also the underlying OS, use

snakemake --use-conda --use-singularity

in combination with any of the modes above. See the Snakemake documentation for further details.