Accessibility Mask Pipeline

This repository provides a Nextflow pipeline for generating high-coverage regions from cram files, which can be valuable for downstream analyses.

Introduction

The Accessibility Mask pipeline utilizes Python scripts and Nextflow to process cram files and identify high-coverage regions. By following a few simple steps, you can generate the accessibility mask for your genomic data.

Pipeline Setup

To set up and run the pipeline, please follow these instructions:

Python Virtual Environment: Activate your Python virtual environment to ensure the required Python modules are installed correctly.
Install Dependencies: Install the following Python modules using the package manager of your choice:
- pandas
- pysam
- statistics
- numpy
Install Nextflow, Samtools, and Tabix: Ensure Nextflow, Samtools, and Tabix are installed on your system. You can find installation instructions for each tool in their respective documentation.
Nextflow Configuration: Place the provided nextflow.config file in the folder where you intend to execute the pipeline. Modify the nextflow.config file based on your specific requirements and settings.
Load Dependencies: Load Nextflow, Samtools, and Tabix in your environment to make them accessible during pipeline execution.
Job Submission: Submit the job to the Compute Canada cluster using the following command:

sbatch --account="name of the account" --time=168:00:00 --mem=4G -J coverage --wrap="nextflow run /path/to/AccessibilityMask/Coverage.nf" -o coverage.slurm.log

Deactivate Virtual Environment: After job submission, remember to deactivate your Python virtual environment to return to the original setting.

CERC-Genomic-Medicine/AccessibilityMask

Accessibility Mask Pipeline

Introduction

Pipeline Setup