Disease Pattern Miner

Disease Pattern Miner is a free, open-source mining framework for interactively discovering sequential disease patterns in medical health record datasets.

Features

Many of the state-of-the-art sequence mining algorithms.
Modular design, but single monolithic web application.
Modern, responsive UI.
Single results table with many different filtering options to explore patterns.
Interactive sequence pattern model to provide insights to disease trajectories.
Tested on Windows 10 & Ubuntu 18.04.

Documentation

The web aplication is designed to perform sequential mining tasks on EHR datasets. The results can be viewed in a table and explored in an interactive Sankey chart.

The dataset for upload has to match the following csv-file format (example full set):

GENDER-AGE-GROUP, PATIENT-ID, YYYYMMDD, (min 1, max 3 ) ICD-9-CM 

f0,EW75937189,20010120,0740,4661,
f0,EW75937189,20010107,37311,,
f0,EW75937189,20010120,V202,,
f0,BU45121182,20010103,4659,7806,
f1,KT61521480,20010109,486,94400,
...

The application will filter and split the data in gender-age-group files (example f0-group set):

<PATIENT-ID>, <YYYYMMDD>, <min_1 max_3 ICD-9-CM codes>

EW75937189,20010120,0740,4661,
EW75937189,20010107,37311,,
BU45121182,20010103,4659,7806,
...

Each gender-age-group set will befiltered & converted to a seq-file for the mining using the ICD-9-CM hierarchy. Positive integers are ordinal values for the ICD-9-CM chapters. -1 represents a TIME_GAP (2 weeks). -2 represents the end of the sequence.

<ICD-9-CM CHAPTERS ORDINALS> <ICD-9-CM CHAPTERS ORDINALS> -1 ... <ICD-9-CM CHAPTERS ORDINALS> -1 -2

5 -1 5 -1 5 -1 5 -1 5 -1 7 -1 9 13 15 -1 9 -1 9 -1 -2
7 -1 7 -1 7 -1 7 -1 5 -1 2 7 9 -1 5 7 -1 7 -1 5 -1 5 15 -1 7 -1 -2
7 9 -1 7 9 -1 7 9 -1 7 9 -1 7 9 -1 7 9 -1 7 9 11 -1 9 -1 -2
...

Many different sequence mining algorithms can be used. For each mining task a result file is produced:

<FREQUENT SEQUENCE PATTERN> #SUP: <ABSOLUTE SUPPORT OF PATTERN>

5 7 -1 7 -1 7 -1 #SUP: 3635
5 7 -1 7 -1 #SUP: 3824
5 7 -1 #SUP: 4000
5 -1 5 -1 7 -1 #SUP: 3551
...

For more detailed examples and project insights please look into the publications or contact author.

System Requirements & Recommendations

A machine with:

4 GB of RAM, although at least 16 GB is recommended. Make sure the server container can access it!
10 GB of drive space, although at least 40 GB is recommended. This might depend on the dataset.

The following software installed:

Java 11 or 12, Java Development Kit (JDK)
Apache Tomcat as servlet container.

Quick Start

Make sure you got all system and software requirements!
Clone the repository.
Build a .war-file of the project.
Deploy the .war-file to the server

Authors

Vitaliy Ostapchuk - Initial work - Vitaliy Ostapchuk

License

This project is licensed under the MIT License - see the LICENSE.md file for details

vitaliy-ostapchuk93/disease-pattern-miner