A Developmental Deconvolution Multilayer Perceptron for Classification of Cancer Origin
Project Overview
Cancer is a disease manifesting in abrogation of developmental programs, and malignancies are named based on their cell or tissue of origin. However, a systematic atlas of tumor origins is lacking.
Here we map the single cell organogenesis of 56 developmental trajectories to the transcriptomes of over 10,000 tumors across 33 cancer types. We use this map to deconvolute individual tumors into their constituent developmental trajectories. Based on these deconvoluted developmental programs, we construct a Developmental Multilayer Perceptron (D-MLP) classifier that outputs cancer origin.
The D-MLP classifier (ROC-AUC: 0.974 for top prediction) outperforms classification based on expression of either oncogenes or highly variable genes. We analyze tumors from patients with cancer of unknown primary (CUP), selecting the most difficult cases where extensive multimodal workup yielded no definitive tumor type. D-MLP revealed insights into developmental origins and diagnosis for most patient tumors.
Our results provide a map of tumor developmental origins, provide a tool for diagnostic pathology, and suggest developmental classification may be a useful approach for otherwise unclassified patient tumors.
Code Overview
The code folder contains the scripts used to generate the analysis and figures shown in Moiso et al. (add url here). The scripts in the code are written in R, shell and Python and requires the following packages:
R
R version 3.5.1 or older is required with the following libraries:
- data.table
- Matrix
- ggplot2
- ggpubr
- pheatmap
- ggalluvial
- RColorBrewer
- reshape2
- viridis
- scales
- alluvial
- lsa
- umap
- parallel
- grDevices
And the following Bioconductor package
Shell
shell scripts are used for fastq reads analysis, and require the following softwares:
Python
Python version 3.6.4 is used to generate and evalaute the MLP models and the following modules and libraries are required:
- keras 2.2.0
- numpy 1.19.5
- scikit-learn 0.19.1
- sys
- tensorflow 1.5.0
Docker
To easily reproduce the analysis and the figures of our work we dockerized the environment we used in the paper.
This requires you to have Docker installed on your system. If you don't don't panic, it is super simple, just follow these instructions for Linux, Mac or Windows
When Docker
is up and running you can clone this git repository with:
git clone https://github.com/emoiso/DevTum.git
After cloning the git you can assemble the image with following command:
cd DevTum
sudo docker build -t devtum .
After the devtum
image has been succesfully build, you can test by runing:
sudo docker run --entrypoint code/figs7b.R -v $PWD:/home devtum
The previous command will generate the umap shown in figure S7 and save it in figs/paper
on your system.
If you encounter any problems, bugs or have any question, please contact Enrico Moiso (em.metaminer@gmail.com).
Created and maintained by Enrico Moiso. Last update 07/11/2022.