/Peak-detection-chip-seq

Implementation of a Peak detection pipeline in Python using machine learning models and sliding window on the H3K9me3_TDH_BP ChIP-seq dataset.

Primary LanguageJupyter NotebookMIT LicenseMIT

ML-Peaks: Chip-seq peak detection pipeline using machine learning techniques

Open In Colab

Experiments results | Data | Decoded Data | 3D plot of data

Abstract

We propose a data preprocessing approach using sliding window and feature reduction techniques, and the resulting features can be further used in machine learning methods. Our machine learning methodology can accurately identify peaks using a small training set, which represents a distinct advantage over commonly used statistical approaches, as it has a greater capacity for learning from data.

We tested our methodology on the H3K9me3_TDH_BP ChIP-seq dataset exploring a range of different machine learning methods, sliding window settings, and feature reduction techniques to detect peak values without human intervention. Our pipeline efficiently detected the peaks, and achieved an F1-score of 0.9644 and a false positive rate of 0.1030.

pipeline

Requirements

  • Python version 3.8
  • scikit-learn
  • numpy
  • matplotlib
  • seaborn

Steps to Reproduce Results

To reproduce the results of experiments related to the pipeline designed for peak detection on the H3K9me3_TDH_BP dataset, follow these steps:

Step 1: Begin by executing the codes in the "Load decoded data and Pre-processing" sections. This will download the decoded dataset and call the necessary libraries. Within this section, you can use the sliders provided in the form to set the "win_size", "shift_size", and "pick_more" parameters for applying the proposed sliding window approach.

Step 2: Choose your desired feature reduction method from the drop-down menu in the "Select a Feature Reduction method" section.

Step 3: In the "ML algorithms and evaluations" section, you can select a machine learning model to detect the peaks using the drop-down form. Once the model has been executed, the results will be shown according to the evaluation criteria.

Citation

If this paper helps your research, please consider citing it:

@inproceedings{sheshkal2023ml,
  title={ML-Peaks: Chip-seq peak detection pipeline using machine learning techniques},
  author={Sheshkal, Sajad Amouei and Riegler, Michael Alexander and Hammer, Hugo Lewi},
  booktitle={2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS)},
  pages={335--340},
  year={2023},
  organization={IEEE}
}