Translocating Event Analysis of Nanopore

1. Description

This repository facilitates the analysis of raw data in the .abf format, with a focus on studying translocating events through nanopores. It provides comprehensive signal analysis capabilities, including:

  1. Identification of Translocating Events: Automatic detection of events within raw data.
  2. Deep Analysis of Signals: Application of various smoothing techniques for detailed signal analysis.
  3. Clustering: Employment of k-means or hierarchical clustering to group events, determine the optimal number of clusters, and more. Future updates will introduce additional clustering methods.
  4. Similar Signal Retrieval: Identification of signals that share similarities within the dataset.
  5. ML-Based Classification: An upcoming feature for classifying signals using machine learning methodologies.

2. Installation

2.1 Requirements

Ensure Python (version 3.8 or newer) is installed on your system. This project's dependencies can be installed via:

pip install -r requirements.txt

2.2 Setting Up

Clone this repository to begin:

git clone https://github.com/yourusername/translocating-event-analysis.git
cd translocating-event-analysis

3. Usage

The repository's features can be accessed as described below:

3.1 Identifying Events and Analysis

To identify translocating events, perform deep signal analysis, and generate plots, execute the following script:

bash find_events_and_plot.sh

The analysis behavior can be customized by modifying the configs/event-analysis.yaml configuration file. This file contains important parameters that control various aspects of the analysis, including paths to data files, smoothing techniques parameters, and versioning for output directories. Update the configuration file according to your specific requirements before running the analysis scripts.

Example configuration parameters include:

  • version: Specifies the version of the analysis, affecting output directory naming.
  • data_file_path: The path to the raw .abf data file for analysis.
  • sampling_rate, base_sigma, gaussian_sigma, etc.: Parameters that control the data processing and analysis techniques.

3.2 Sample Outputs

The following are examples of outputs generated by the analysis scripts. Ensure the images are available in your repository or hosted online to be viewable in the README.

  • Static Plot Examples:

3.3 Interactive HTML Visualizations

For an interactive analysis experience, the bash script generates HTML files that can be viewed in any modern web browser. To view these interactive visualizations, download the HTML files from the following paths within the repository, and open them in your browser:

  • plots/dips_plots_07_0s_300s_soft/0s-300s/dip_100_start_72.254200s_end_72.261580s.html
  • plots/dips_plots_07_0s_300s_soft/0s-300s/dip_112_start_80.829836s_end_80.830548s.html

Simply click on the links to navigate to the files in GitHub, then use the "Download" or "Raw" options to save them to your computer. Once downloaded, open the files with your web browser to explore the data interactively.

3.4 Clustering

3.4.1 Finding Optimal Cluster Size

To find the optimal cluster size in your data:

python optimal_cluster_number.py

3.4.2 Clustering with K-Means or Hierarchical Methods

For applying hierarchical and k-means clustering:

python cluster_kmeans.py
python hierarchical_cluster.py

The following are examples of outputs generated by the kmeans and hierarchical cluster scripts.

The clustering behavior can be customized by modifying the configs/clustering-analysis.yaml configuration file. This file contains important parameters that control various aspects of the analysis. Update the configuration file according to your specific requirements before running the analysis scripts.

Example configuration parameters include:

  • feature_labels: ['Depth', 'Width', 'Area', 'Std Dev', 'Skewness', 'Kurtosis', 'Dwelling Time']: Selecting features for clustering
  • k: Number of cluster for KMeans
  • max_distance: Max distance to cut the tree on hr clustering

3.5 Retrieving Similar Events

To find similar translocating events within the dataset run the following command:

python similar_signals.py

The following are examples of outputs generated by the two query dip signal

3.6 Running the API for Dashboard

To run the API for the dashboard, first ensure you have your cluster-analysis.yaml file set up with the dip directory you want to analyze. Then, you can start the API using the following command:

uvicorn api:app --reload

This will start the FastAPI server, and you can access the dashboard by navigating to http://127.0.0.1:8000 in your web browser.

4. Contributing

Contributions are welcome! If you have suggestions for improvements or new features, feel free to fork this repository, make your changes, and submit a pull request.

5. License

MIT License