- Install the package using the command
git clone https://github.com/karanwxliaa/GemVAE.git
- Unzip all the datasets in the Data dir
- Install dependancies
pip install requirments.txt
- Run the .ipynb Tutorials
In the /data directory there are several datasets present in the zipped format which need to be extracted before running.
graph LR
init[Initialization] --> importTrain[Import train_GEMVAE]
init --> importUtils[Import Utils Functions]
init --> importClustering[Import Clustering Functions]
importUtils --> calSpatialNet[Cal_Spatial_Net]
importUtils --> statsSpatialNet[Stats_Spatial_Net]
importUtils --> mclustR[mclust_R]
importClustering --> clustering[clustering Function]
clustering --> pca[PCA]
clustering --> mclust[mclust_R]
clustering --> leiden[Leiden]
clustering --> louvain[Louvain]
importTrain --> trainGEMVAE[train_GEMVAE Function]
trainGEMVAE --> prepareData[Prepare Data]
prepareData --> calSpatialNet
prepareData --> pruneNet[Prune Spatial Net]
pruneNet --> spatialClustering[Use Clustering]
trainGEMVAE --> initializeModel[Initialize GEMVAE Model]
initializeModel --> modelDef[Define GATE Model]
modelDef --> encoder[Encoder Layers]
modelDef --> decoder[Decoder Layers]
initializeModel --> trainingLoop[Training Loop]
trainingLoop --> epoch[Run Epoch]
epoch --> updateModel[Update Model Parameters]
trainingLoop --> inferModel[Infer with Model]
inferModel --> analysis[Analyze and Visualize Results]
tutorial[Tutorial Notebook] --> followSteps[Follow Tutorial Steps]
followSteps --> prepareData
followSteps --> trainGEMVAE
followSteps --> inferModel
followSteps --> analysis
Spatial Multo-Omic clustering of
- Stereo SiteSeq dataset
- Landau Spots dataset
Tutorial_Landau_BC = Landau Spots Breast Cancer
Tutorial_Landau_SR1 = Landau Spots Spleen Rep 1
Tutorial_Landau_SR2 = Landau Spots Spleen Rep 2
Tutorial_Spatial_SC = Landau Spatial Cite Seq
Tutorial_SSC_MT = Stereo Cite Seq Mouse Thymus
Tutorial_Generated_data = Generated data
Based on the code files you've provided for the GemVAE package and the tutorial notebook, I will outline the flow of the code and its components, which will then be represented in a mermaid flowchart.
-
Initialization (
__init__.py
):- Initializes the package by importing the main components:
train_GEMVAE
, various clustering methods, and utility functions likeCal_Spatial_Net
,Stats_Spatial_Net
, etc.
- Initializes the package by importing the main components:
-
Utility Functions (
utils.py
):- Contains functions to calculate spatial networks, perform clustering using mclust from R, and plot weight values. It serves as support for preprocessing and analysis tasks.
-
Clustering (
clustering.py
):- Implements functions for spatial clustering, leveraging methods like mclust, leiden, and louvain for identifying cell clusters based on spatial and expression data.
-
Model Definition (
model.py
):- Defines the GATE model, which is a core part of GEMVAE, including encoder and decoder parts for genes and proteins, graph attention layers, and the variational autoencoder logic.
-
GEMVAE Core (
GEMVAE.py
):- Implements the GEMVAE class, orchestrating the model's training process, including setting up the placeholders, building the model, running epochs, and inferring results.
-
Training Script (
Train_GEMVAE.py
):- Provides a high-level interface to train the GEMVAE model on given data, handling data preparation, model initialization, training, and output processing.
-
Tutorial Notebook (
Tutorial_Landau_SR1.ipynb
):- A Jupyter notebook demonstrating how to use GemVAE for a specific dataset, likely guiding the user through data loading, model training, and analysis of results.