Example of a pipeline for processing ChIPseq data going from fastq files to binding sites (peaks).
Although this code is fully functional, it is only meant to showcase the use of Python and Snakemake.
Using conda and mamba configured for working with bioconda:
mamba create -n test-chipseq --yes
mamba activate test-chipseq
mamba install -n test-chipseq --file requirements.txt --yes
snakemake -C sample_sheet=$PWD/test/data/sample_sheet.tsv \
genome=$PWD/test/data/genome.fa \
-p --dry-run -j 4 --directory output
Where:
-
sample_sheet: Full path of tab-separated file of library characteristics with columns:
library_id
: Unique library IDtype
: Type of library:chip
orinput
control_id
: ID of the control library for this library or NA if library_id is aninput
libraryfastq_r1
: Path to fastq file; path relative to output directory
-
genome: Full path to fasta file of reference genome
-
--directory: Output directory
-p -j ...
: For this and other options see snakemake -h
. Remove --dry-run
for actual execution.
Run tests:
./test/test.py
Format code:
snakefmt Snakefile
black lib/utils.py test/test.py