patient-viz is a tool allowing to view and explore insurance claim or other time sequence event data. The web-based tool is mostly written in d3 and uses python and shell on the back-end. Example data from medical insurance claim data can be downloaded automatically. We also have a live demo! The project is a joint product of Josua Krause, Narges Sharif Razavian, Enrico Bertini, and David Sontag.
Setting up the project can be done without prerequisites on MacOS and linux. For windows you need to install git. You can use git BASH to execute the shell commands.
In order to set up the tool please run the following command:
$ ./setup.sh --default
This downloads all necessary files for label creation of claims data events
and also example patient claims data. Note that, by downloading the data,
you agree with the respective terms and conditions of the sources
(info pages can be accessed when running the script). You will
be prompted to confirm before downloading (this can be silenced
via -s
). In total, required space will be approximately 3 GB
.
Processing of some files may take a while letting the script appear
to be frozen (which is not the case!).
If you only want to partially download the example data and descriptions
refer to the help: ./setup.sh -h
. The tool works without any of this data
but uses git submodules which need to be initialized manually
when no setup is performed.
After setting up the tool, files can be viewed with the following command:
$ ./start.sh -p json/AEF023C2029F05BC.json --start
where the argument after -p
points to one of the previously created
event sequence files which can be found in json/
. The command starts
a server which can be stopped using ./start.sh --stop
.
The list of available files can be seen using:
$ ./start.sh --list
Patient files can be created manually from the example claims data by issuing the following commands:
$ ./opd_get_patient.py -p AEF023C2029F05BC -f format.json -o json/AEF023C2029F05BC.json -- opd
$ ./build_dictionary.py -p json/AEF023C2029F05BC.json -c config.txt -o json/dictionary.json
$ ./start.sh --list-update
./opd_analyze.py -f format.json -- opd
can be used to see which patient ids for -p
are available.
Own data can be loaded by passing the event sequence file as argument to -p
and the dictionary file as argument to -d
. For further information you
can consult the help (./start.sh -h
) and the JSON specification.
The system can also handle data stored in a shelve db. However, you need to manually convert
patients stored this way and update the config.txt
file.
$ ./shelve_access.py -p AEF023C2029F05BC -c config.txt | ./opd_get_patient.py -p AEF023C2029F05BC -f format_shelve.json -o json/AEF023C2029F05BC.json -- -
$ ./build_dictionary.py -p json/AEF023C2029F05BC.json -c config.txt -o json/dictionary.json
$ ./start.sh --list-update
The list of available patient ids can be seen using:
$ ./shelve_access.py -c config.txt -l
The data can also be used for predictive modeling. In order to prepare the data for that please refer to the feature extraction documentation.
Pull requests are highly appreciated :) Also, feel free to open issues for any questions or bugs you may encounter.