This repository contains the code used to perform the modelling and analysis for the paper:
Identification of Lysosomotropism using Explainable Machine Learning and Morphological Profiling Cell Painting Data
Aishvarya Tandon, Anna Santura, Dr. Axel Pahl, Prof. Dr. Herbert Waldmann, Prof. Dr. Paul Czodrowski
Example Jupyter Notebooks are provided to demonstrate the use of the code.
This code was developed and tested on Linux (Ubuntu 22.04.3) and macOS (Big Sur 11.7.10).
The repository can be cloned using the following command:
git clone https://github.com/czodrowskilab/lyso_project_open
cd lyso_project_open
Then, create a new mamba or conda environment using the provided environment file environment_xml.yml
(for XML part):
mamba env create -f environment_xml.yml
mamba activate lyso_xml-env
Replace mamba
with conda
if you don't have mamba installed.
To use lyso_project
as a package in a Jupyter notebook or a Python file, you need to append the path of the lyso_project
directory to your system path. Here's how:
import sys
# Prevent Python from writing .pyc files to the disk
sys.dont_write_bytecode = True
# Append the path of the lyso_project directory to your system path
sys.path.append("/path/to/lyso_project")
Replace /path/to/lyso_project
with the actual path of the lyso_project
directory on your system.
In addition to the packages listed in environment_xml.yml
, X-FP is also used in this project. Please follow the instructions on the X-FP repository for installation and usage.
Contains functions to preprocess the internal CP dataset.
Contains various useful functions to generate various molecular fingerprints, and calculate RDKit descriptors. Also contains minor utility functions.
Module to create model objects, perform cross-validation, and calculate metrics.
Key components include:
- ModelRunner class: This class is used to create model objects for each input type. It has methods for setting the model's input data, which can include training data, training labels, test data, test labels. Additionally, it has methods for saving models, and writing logs.
Contains functions to read the log files generated by the ModelRunner
class, and plot the results.
Please refer to the inline comments in the code for more detailed information about the functionality of each method.