/DECOY

Data Entrepreneur Clinical Observables Yardstick

Apache License 2.0Apache-2.0

DECOY

Data Entrepreneur Clinical Observables Yardstick

synthetic data for labs, meds, etc.
a clinical equivalent to DeSYN PUF Files

by Dan Connolly
under the direction of Russ Waitman, KUMC Director of Medical Informatics

Copyright (c) 2015 University of Kansas Medical Center
Share and enjoy under the terms of the Apache License, Version 2.0

Design Sketch

  • Take the fact counts on babel and add the places together that report fact counts to create a pooled set of patients and facts at each fact and ontology
  • Then distribute them across the patients using different standard distributions.:
    • We’d have “ugly decoy” which uses a uniform distribution but is really useful to do simple unit and integration tests
    • Pink Decoy follow a Poisson Distribution
    • Green DECOY uses a Gaussian
    • Blue DECOY uses Beford
      • related work: by Jason Doctor looking at clinical data distributions and evaluating fraudulent upcoding of diagnoses
  • Make them as dirt simple CSV files that mimic the RESDAC files.
    • so that the OMOP people and Sentinetl people can use them too
  • Provide the ETL to bring into i2b2.

Stretch Goal

Evaluate the real distributions and model each type of fact using the most approximate distribution or base on real pooled distributions.

Acknowledgements

This work was supported by a CTSA grant from NCRR and NCATS awarded to the University of Kansas Medical Center for Frontiers: The Heartland Institute for Clinical and Translational Research # UL1TR000001 (formerly #UL1RR033179). The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH, NCRR, or NCATS.

It's based on HERON, i2b2, and GPC: