
Example of a pipeline for processing ChIPseq data

Primary LanguagePython

test status Language Snakemake License


Example of a pipeline for processing ChIPseq data going from fastq files to binding sites (peaks).

Although this code is fully functional, it is only meant to showcase the use of Python and Snakemake.


Using conda and mamba configured for working with bioconda:

mamba create -n test-chipseq --yes
mamba activate test-chipseq
mamba install -n test-chipseq --file requirements.txt --yes


snakemake -C sample_sheet=$PWD/test/data/sample_sheet.tsv \
             genome=$PWD/test/data/genome.fa \
    -p --dry-run -j 4 --directory output


  • sample_sheet: Full path of tab-separated file of library characteristics with columns:

    • library_id: Unique library ID
    • type: Type of library: chip or input
    • control_id: ID of the control library for this library or NA if library_id is an input library
    • fastq_r1: Path to fastq file; path relative to output directory
  • genome: Full path to fasta file of reference genome

  • --directory: Output directory

-p -j ...: For this and other options see snakemake -h. Remove --dry-run for actual execution.


Run tests:


Format code:

snakefmt Snakefile
black lib/utils.py test/test.py