/Multi-view-Deconfounding-VAE

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Project overview

  • Aim: develop a Multi-view deconfounding VAE (multi-view data integration + confounder correction)
  • Data:
    • Rotterdam study
      • 500 individuals
      • Cardiovascular diseases
      • Methylation
      • 3D facial images
    • Toy data (TCGA)
      • 2547 patients
      • 6 cancers
      • 2000 most variable mRNAs
      • 2000 most variable DNAm
  • Conducted by Sonja Katz and Zuqi Li
  • Supervised by Prof. Kristel Van Steen, Dr. Gennady Roshchupkin and Prof. Vitor Martins Dos Santos
  • Google folder: https://drive.google.com/drive/folders/1GwZbMpVWW4xqdxmw_JRq9-DAR0WlnkE4

Installation

## cd Multi-view-Deconfounding-VAE
conda env create -f environment.yml
source activate env_multiviewVAE

Overview models:

  • all models are in folder: models

  • optimal architecture from modelOptimisation experiments: latentSize = 50; hiddenlayers = 200

  • XVAE:

    • XVAE - Simidjievski, Nikola, et al. "Variational autoencoders for cancer data integration: design principles and computational practice." Frontiers in genetics 10 (2019): 1205.


  • cXVAE:

    1. input+embed:
    2. input:
    3. embed:
    4. fused+embed:
  • Adversarial Training

    1. XVAE with one adversarial network and multiclass predicition: adversarial_XVAE_multiclass

      • XVAE_adversarial_multiclass: inspired by Dincer et al.; training over all batches
      • XVAE_adversarial_1batch_multiclass: original by Dincer et al.



    2. XVAE with multiple adversarial network (one for each confounder): adversarial_XVAE_multipleAdvNet

      • XAE with multiple adversarial network (outdated?)

Workplan

  • 1. Select basic model
    • Simidjievski, Nikola, et al. "Variational autoencoders for cancer data integration: design principles and computational practice." Frontiers in genetics 10 (2019): 1205.
    • https://github.com/CancerAI-CL/IntegrativeVAEs
    • The X-shaped Variational Autoencoder (X-VAE) Architecture is overall recommended in this comparative study
  • 2. Reform the basic model
    • Implement in Pytorch Lightning
    • Rearrange code
    • Provide two latent loss functions (KL divergence and Maximum Mean Discrepancy)
    • Implement testing metrics
  • 3. Create a clustering model
    • Strategy 1: Run K-means (or other clustering methods) on the latent space
    • Strategy 2: Add a term in the loss function to iteratively optimize the clustering quality
  • 4. Correct for confounders
    • Strategy 1: Take confounders into account during decoding and the loss function is conditioned on the confounders; Adapted from: Lawry Aguila, Ana, et al. "Conditional VAEs for Confound Removal and Normative Modelling of Neurodegenerative Diseases." Medical Image Computing and Computer Assisted Intervention–MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, Part I. Cham: Springer Nature Switzerland, 2022.

      • Build cVAE - concat covariates to input dim & latent dim
    • Strategy 2: Add a term in loss function to minimize the association/similarity between the latent embedding and confounders

    • Strategy 3: Simply remove confounded latent features