Cookiecutter template for snakemake workflows and workflowr documentation. This project template is inspired by this cookiecutter snakemake project template and the suggested project structure for workflowr R package for creating a static site from Rmarkdown files. I use the project template as follows:
Install cookiecutter:
pip install cookiecutter
Start creating a snakemake-workflow from cookiecutter:
cookiecutter https://github.com/daiuc/cookiecutter-wflowR-smk.git
After filling the prompts, this will create a project template with the following directory structure:
.
├── analysis
│ ├── about.Rmd
│ ├── index.Rmd
│ ├── license.Rmd
│ └── _site.yml
├── code
│ ├── cluster-config.json
│ ├── config.yaml
│ ├── envs
│ │ ├── {{\ cookiecutter.repo_name\ }}.yaml
│ │ └── myenv.yaml
│ ├── LICENSE.md
│ ├── README.md
│ ├── workflow
│ │ ├── Snakefile
│ │ ├── rules
│ │ ├── scripts
│ │ ├── notebooks
│ │ ├── reports
│ │ └── envs
│ │ ├── {{\ cookiecutter.repo_name\ }}.yaml
│ │ └── myenv.yaml
│ ├── snakemake_profiles
│ │ └── slurm
│ │ ├── cluster-config.yaml
│ │ ├── config.yaml
│ │ ├── __pycache__
│ │ │ └── slurm_utils.cpython-36.pyc
│ │ ├── slurm-jobscript.sh
│ │ ├── slurm-status.py
│ │ ├── slurm-submit.py
│ │ └── slurm_utils.py
│ └── snakemake.sbatch
├── {{\ cookiecutter.repo_name\ }}.Rproj
├── data
│ └── README.md
├── docs
│ └── assets
├── output
│ └── README.md
├── README.md
└── _workflowr.yml
13 directories, 28 files
Start scripting and documenting your project.
- Optionally track the entire project with
git init
in the newly created project root. - use the
code
directory to create a reproducible snakemake pipeline which does heavy lifting analysis to be run on a cluster environemt fromcode
as the working directory. Thecode/.gitignore
makes it easy to git track all the code, but ignore tracking the potentially large files in thecode
directory. So as you write the Snakemake pipeline, it is ok to create large untracked files which are too big to push to github. Use the snakemake to do heavy lifting (eg Download NGS data, align, etc) and process the data to small files that can be easily tracked and pushed to github. The cookiecutter will optionally create a conda environment for the snakemake with snakemake and some basic NGS processing softwarescode/envs/{{ cookiecutter.repo_name }}.yaml
and if you need to create additional rule-specific conda environments for the snakemake, they should also be saved incode/envs/
. See thecode/README.md
for more on my snakemake project template. - Write the snakemake pipeline to output smaller processed files (eg gene x sample count tables for RNA-seq) to
output
where they will be tracked by git. - Raw data (that should never be directly edited) that is small enough to track with git should go in
data
. - Use the
analysis
directory to write Rmarkdown files which you want to use to document your thoughts and analysis of processed data. If these Rmd files read in the data files inoutput
ordata
it should be easy for anyone to edit or re-run your Rmd files by cloning this repo (without needing to run the snakemake or do the computationally intensive things) - Use the workflowr R package to render all the Rmd files in
analysis
into html in thedocs
folder (workflowr::wflow_build()
function). Use github's free hosting of static sites to share thedocs
website. Thedocs/assets
might be useful to save images that can reference in the site. - Occasionally I write Rmd files that need access to the large untracked files output by the Snakemake. In this case, it is nice to follow a naming convention to specify which Rmd files need access to these large files.