/luad-cell-painting

Repository for analysis of CellPainting data in the LUAD dataset

Primary LanguageJupyter Notebook

LUAD analysis using DeepProfiler

This repository contains the source code to run the cmVIP analysis in the LUAD dataset. Our internal repository is here.

Profiling

1. Install requirements

This folder is a DeepProfiler project. Experiments reported in the paper used the c91b9d8 commit.

To install the dependencies, including the DeepProfiler version we used, run:

$ pip install -r requirements.txt

2. Download the data

Be aware this script will override any previous data. To download the data run:

$ utils/download_all.sh

3. Prepare the data.

  1. Run extract_locations.py script to generate location files.

  2. Use DeepProfiler to prepare the dataset:

$ python3 -m deepprofiler --root=./ --config luad.json --gpu 0 prepare

--gpu option sets the GPU id to use.

4. Extract features.

Use DeepProfiler to extract features:

$ python3 -m deepprofiler --gpu 0 --exp efn_pretrained --root ./ --config luad.json profile

5. Create well profiles.

To create the well-based profiles run:

$ python3 utils/create_profiles.py

It will write a pd.DataFrame in parquet with profiles.

VIP analysis

The analysis is split in three notebooks:

Notes about the dataset

From the paper:

An additional 88 constructs are included in the dataset, representing TP53 alleles that inadvertently had double mutations. A comprehensive description of the process for selecting the constructs that were analyzed is presented in Supplementary Figure 2.

We have filtered out these constructs in the Filter quality control status section of the 2-Cell-Morphology-VIP.ipynb notebook.