/Knowledge

Knowledge base creation for HII-C project

Primary LanguagePython

Knowledge

About

This is the Knowledge repository for the HII-C project. This repository contains the scripts that process the medical source data to form an intelligent medical knowledge base. If you are looking for how this knowledge base is processed and delivered on our API, look at our HIIC-API repository.

Installing/Building

If you wish to change the default environment values and configurations, you can modify the config files provided in the data folder prior to building. Note that modules can still be loaded with custom configs at runtime. See the documentation for more details regarding important environment variables and configurations.

To build, clone this repository and install using pip:

    git clone https://github.com/HII-C/Knowledge.git /path/to/repository
    pip install /path/to/repository

The package will be installed under the name knowledge-hiic and root module will be named knowledge.

Running

Since this repository is run as an install python package, scripts are run by calling modules. All modules in the knowledge.model and knowledge.env are runnable, so they can be run by calling them using pyhton -m. Note that every module requires a config file, whose parameters can be found looking at the module's documentation or config and spec files. The default config will be run if no config is specified.

    python -m knowledge.[module].[submodule] --config /path/to/config.json

All scripts will validate config first to minimize runtime errors. Also, modules will never delete files, drop tables, or create databases without prompting the user first unless specified otherwise in the config.

Deploying

Sometimes models must be run in shared environments without full access to python packaging. Models can be run as standalone files using their standalone versions in the deploy folder. Note that deployed versions of scripts will not have all the features of the packaged versions, mostly due to the absence of a MySQL database connection; see each model's documentation regarding deployment running options.

To build the deployment files, make sure stickytape is installed and then,

    cd /path/to/repository
    bash deploy/deploy.sh

To run a deployed model,

    cd /path/to/repository
    python deploy/[model]-standalone.py \
        --config /path/to/config.json
        --specs /path/to/specs.json
        --pkg false

Note that specifying these three typically optional parameters to a model run is required in deployed scripts.

Environment Creation

Once you have built the package, you may want to recreate the environment for a given medical dataset. This requires that you install the following data sets into MySQL:

  • UMLS
  • MIMIC
  • RxNorm
  • SNOMED

After the source data has been installed, the derived tables can be regenerated by running the knowledge.env module, which will create all derived tables in the order of dependency.

Documentation

knowledge.model

These are the core models that are used to create the knowledge basis of the HII-C API. These modules can are executable and may be run multiple times under multiple configurations. Check the data/config folder for model configuration files used in this project.

Documenation of models:

knowledge.util

These are little helper classes that make every-script tasks easier. The only runnable util is the dropbox util; otherwise, these are only used within other scripts.

Documentation of utilities:

knowledge.env

These are a collection of scripts focused on the manipulation of data in the knowledge environment. Unlike the knowledge.models modules, these do not create or find new information, but rather translate, extract, and link sets of data for easier access by both the modules and API. These modules can are executable and may be run multiple times under multiple configurations. Check the data/config folder for environment configuration files used in this project.

Documentation of environment scripts:

knowledge.struct

These are a collection of classes that represent data structures used in models. Structures are not runnable, but are used by models to organize and manipulate data.

Documenation of structures: