/DGCyTOF

New cell populations identification to CyTOF single cell data

Primary LanguageJupyter Notebook

DGCyTOF

Framework

API Link here

What is DGCyTOF

A new tool Deep learning with Graphical clustering for CyTOF, called DGCyTOF, is developed to identify cell populations. The method novelty is to distinguish the calibration cell population and the new cell population by a sequential four steps with an iteration calibration system.

First, deep classification learning is constructed to identify the calibration cell population by the SoftMax classification assignment. These unknown types of cells will be filtered by the probability threshold from all cells.

Second, an embedding graphical clustering is used to identify the new cell populations by structure clustering sequentially. The embedding graphical clustering simultaneously learns feature representations while preserving data structure in a cell-cell interaction network for cluster assignments.

Third, in the middle of layer, an iteration calibration system is used to adjust the cell populations between the new and the unknown cell populations. The iteration calibration system will automatically adjust to the classification and the clustering by a feedback loop to reduce cell type identification error rate.

Finally, a 3-dimension (3D) visualization platform is developed to display these cells clusters by projecting all labels on independent 3D space, which keeps cell types easily identity and cell type’s differential in vivid.

Features

DGCyTOF provides a deep learning model with graphical cluster visualization for CyTOF to identify cell populations. The original data with all validations are located under Paper-Study/ on GitHub, and package is located under Python-Package/.

The package allows for the following functions:

  • Data Preprocessing
  • Training
  • Validation
  • Calibration
  • Dimensionality Reduction and Clustering
  • Cell Type Classification, and Identification of New Cell Types
  • Visualization of Cell Types on 2D and 3D Plots.

Raw Demo Files

Levine32dim.txt can be found from https://github.com/lmweber/benchmark-data-Levine-32-dim Levine13dim.txt can be found from https://github.com/lmweber/benchmark-data-Levine-13-dim

Contact

Feel free to contact Dr. Lijun Cheng at Lijun.Cheng@osumc.edu for questions regarding the study.

Installation

Creating from source

After creating your desired environment, run the following under package folder

python setup.py sdist
pip install dist/DGCyTOF-1.0.0.tar.gz --user

Requirements

The project requires several packages to work appropriately. These packages and their versions can be found within requirements.txt or alternatively down below.

scipy==1.5.2
numpy==1.19.1
seaborn==0.10.1
matplotlib==3.3.0
pandas==1.1.0
umap_learn==0.4.6
hdbscan==0.8.26
scikit_learn==0.23.2
scikit_plot==0.3.7
scikit_bio==0.5.6
umap-learn==0.4.6
torch==1.6.0
torchvision==0.7.0
tensorboard==2.3.0