/dna-proto-workflow

Snakemake ProtoWorkflow for DNA Analysis.

Primary LanguagePython

dna-proto workflow (Snakemake)

Girl in a jacket

This Snakemake workflow is for analysing genome re-sequencing experiments. It features 2 modes. The de-novo mode is used to confirm sample relationships from the raw sequencing reads with kwip and mash. The varcall mode performs read alignments to one or several reference genomes followed by variant detection. Read alignments can be performed with bwa and/or NextGenMap and variant calling with Freebayes and/or bcftools mpileup. These tools are currently the best performing tools when re-sequencing large plant genomes. Between read alignment and variant calling, PCR duplicates are flagged with samtools markdup and indels realigned with abra2. If a genome annotation is available, variants are annotated with snpEff.

Authors

  • Norman Warthmann
  • Marcos Conde
  • Kevin Murray*

*Core functionality of this workflow is based on PaneucalyptShortReads

Usage

  1. Create a new github repository in your github account using this workflow as a template.
  2. Clone your newly created repository to your local system where you want to perform the analysis.
  3. Setup the software dependencies
  4. Configure the workflow for your needs and input files
  5. Run the workflow
  6. Archive your workflow for documenting your work and easy reproduction.

Some pointers for setup, configuring, and running the workflow are below, for details please consult the technical documentation.


An easy way to setup the dependencies is conda.

Get the Miniconda Python 3 distribution:

$ wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh
$ conda install mamba

Create an environment with the required software:

NOTE: conda's enviroment name in these examples is dna-proto.

$ mamba env create --file envs/all-dependencies.yml

Activate the environment:

$ conda activate dna-proto

Additional useful conda commands are here.



Check config and metadata

We provide scripts to list metadata and configuration parameters in utils/.

python utils/check_metadata.py
python utils/check_config.py

Visualising the workflow

You can check the workflow in graphical form by printing the so-called DAG.

snakemake --dag -npr -j -1 | dot -Tsvg > dag.svg
eog dag.svg

Pretending a run of the workflow

Prior to running the workflow, pretend a run and confirm it will do what is intended.

snakemake  -npr

Data

Main directory content:

.
├── envs
├── genomes_and_annotations
├── metadata
├── output
├── rules
├── scripts
├── utils
├── config.yml
├── Snakefile
├── snpEff.config

NOTE: the output directory and some files in the metadata directory are/will be generated by the workflow.

You will need to configure the workflow for your specific project. For details see the technical documentation. Below files and directories will need editing:

  • Snakefile
  • genomes_and_annotations/
  • metadata/
  • config.yml
  • snpEff.config

You can download example data for testing the workflow. click here to download

--

How to contribute

fork this repository

Fork this repository

Fork this repository by clicking on the fork button on the top of this page. This will create a copy of this repository in your GitHub account (not in your computer).


Clone the repository

clone this repository

Now clone the forked repository to your machine. Go to your GitHub account, open the forked repository, click on the clone button and then click the copy to clipboard icon. The url is going to be like: https://github.com/your-username/dna-proto-workflow.git where your-username is your GitHub username.

Open a terminal and run the following git command:

git clone https://github.com/your-username/dna-proto-workflow.git

copy URL to clipboard

Once you've cloned your fork, you can edit your local copy. However, if you want to contribute, you'll need to create a new branch.

Create a branch

Change to the repository directory on your computer (if you are not already there):

NOTE: Don't change the name of this directory!

cd dna-proto-workflow

You can check your branches and active branch, using the git branch command.

git branch -a

Now create a branch using the git checkout command:

git checkout -b new-branch-name

For example:

git checkout -b development

From this point, you are in the new branch and edits only affect your branch. If things go wrong, simply remove your branch using

git branch -d name-of-the-branch

Or revert back to the master-branch using

git checkout master

Make changes and commit

Once you've modified something, you can confirm that there are changes with git status (called from the top-level directory). Add those changes to your branch with git add:

git status
git add .
or
git add name_of_the_file_you_modified

Commit those changes with git commit:

git commit -m "write a message"

Push changes to GitHub

Push the changes in your local copy (on your machine) to your remote repository on GitHub with git push:

git push origin your-branch-name

replacing your-branch-name with the name of the branch you created earlier (e.g., development).

Contributing to the project: Submit your changes for review

In your repository on GitHub, klick the Compare & pull request button.

Now submit the pull request and you'll see something like:

submit pull request

We'll get notified and can check your changes and merge them into this project (in general, into the master branch).