
Shi-VAE: Sequential Heterogeneous Incomplete VAE

Code implementation for the paper Medical data wrangling with sequential variational autoencoders. Preprint also available in arxiv. Please, if you use this code, cite the paper using:

The installation can be done using Conda:

conda env create -f environment.yml 
conda activate shivae

or using the requirements.txt. Using pip, it can be installed with the command below:

conda create -n shivae python=3.7 
conda activate shivae
pip install -r requirements.txt


HMM Data

In data/, you can find an already created synthetic dataset using an Heterogeneous HMM. If you want to create a custom heterogeneous HMM dataset, go to the README.md document inside hmm_dataset.


In data/physionet_burst/ unzip the zip file:

tar -xvf physionet_burst.zip    

This zip file contains 4 files:

  • data_types.csv: CSV file containing the information for all the attributes.

  • data_types_real.csv: CSV file containing the information for all the attributes, assuming only real values.

  • data_types_pos.csv: CSV file containing the information for all the attributes, assuming only positive values.

  • physionet_burst.npz: NPZ file containing:

    • x_{set}_full: Data for {set} containing all the observed attributes.
    • x_{set}_miss: Data for {set} containing observed values, except those values used as artificial missing for evaluation.
    • m_{set}_miss: Missing mask for {set}, containing real missing and artificial missing.
    • x_{set}_artificial: Mask for {set} at those observed values used for evaluation.
    • y_{set}: y value for {set}.


For running all the experiments, always use the src folder as working directory. Some examples are included in main_hmm.py and main_physionet.py. You can run the following command to run an example on the synthetic dataset:

python3 main_hmm.py --experiment hmm --train -1 --n_epochs 100 --z_dim 2 --K 3 --kl_annealing_epochs 20 --percent_miss 30

In case you want to check the performance with the physionet dataset, run this example:

python3 main_physionet.py --experiment physionet --train -1 --n_epochs 100 --z_dim 35 --K 10 --kl_annealing_epochs 20

By running these examples you will train the models and check the performance with some images generated for reconstruction and generation.

For running the scripts used in the paper, you can just run

# Physionet Results, inside /src:
python3 results_hmm.py --train -1
# Physionet Results, inside /src:
python3 results_physionet.py --train -1


Using the --help option displays the following information.

Daniel Barrejon
Pablo M. Olmos
Antonio Artés-Rodríguez

Contact Information

For any question regarding the code or the model, please send an email to: dbarrejo@ing.uc3m.es