/umls-embeddings

Adversarial Learning of Knowledge Embeddings for the Unified Medical Language System

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Code for the paper Adversarial Learning of Knowledge Embeddings for the Unified Medical Language System to be presented at the AMIA Informatics Summit 2019.

Requires Tensorflow version 1.9

Data Preprocessing:

  1. First download/extract the UMLS. This project assumes the UMLS files are laid out as such:

    <UMLS_DIR>/
        META/
            MRCONSO.RRF
            MRSTY.RRF
        NET/
            SRSTRE1
    
  2. Create Metathesaurus triples

    python -m eukg.data.create_triples <UMLS_DIR>

    This will create the Metathesaurus train/test triples in data/metathesaurus.

  3. Create Semantic Network Triples

    python -m eukg.data.create_triples <UMLS_DIR>

Training:

To train the Metathesaurus Discriminator:

python -m eukg.train --mode=disc --model=transd --run_name=transd-disc --no_semantic_network

To train the both Metathesaurus and Semantic Network Discriminators:

python -m eukg.train --mode=disc --model=transd --run_name=transd-sn-disc

To train the Metathesaurus Generator:

python -m eukg.train --mode=gen --model=distmult --run_name=dm-gen --no_semantic_network --learning_rate=1e-3

To train the Metathesaurus and Semantic Network Generators:

python -m eukg.train --mode=gen --model=distmult --run_name=dm-sn-gen --learning_rate=1e-3

To train the full GAN model:

python -m eukg.train --mode=gan --model=transd --run_name=gan --dis_run_name=transd-sn-disc --gen_run_name=dm-sn-gen

Note that the GAN model requires a pretrained discriminator and generator