This repository contains the source code to run the cmVIP analysis in the LUAD dataset. Our internal repository is here.
This folder is a DeepProfiler
project. Experiments reported in the paper used the
c91b9d8
commit.
To install the dependencies, including the DeepProfiler version we used, run:
$ pip install -r requirements.txt
Be aware this script will override any previous data. To download the data run:
$ utils/download_all.sh
-
Run
extract_locations.py
script to generate location files. -
Use DeepProfiler to prepare the dataset:
$ python3 -m deepprofiler --root=./ --config luad.json --gpu 0 prepare
--gpu
option sets the GPU id to use.
Use DeepProfiler to extract features:
$ python3 -m deepprofiler --gpu 0 --exp efn_pretrained --root ./ --config luad.json profile
To create the well-based profiles run:
$ python3 utils/create_profiles.py
It will write a pd.DataFrame
in parquet with profiles.
The analysis is split in three notebooks:
- 1-Expression-VIP.ipynb: Run the baseline analysis using L1000 profiling.
- 2-Cell-Morphology-VIP.ipynb: Run the Cell Morphology VIP method.
- 3-Aggregation-plots.ipynb: Create the plots summarizing results.
From the paper:
An additional 88 constructs are included in the dataset, representing TP53 alleles that inadvertently had double mutations. A comprehensive description of the process for selecting the constructs that were analyzed is presented in Supplementary Figure 2.
We have filtered out these constructs in the Filter quality control status section of the 2-Cell-Morphology-VIP.ipynb notebook.