OneFL Deduper

Branch	[Travis-CI]	[Coveralls]
Master
Develop

Intro

Welcome to the OneFlorida "De-Duper" tool.

This tool genereates "Unique Identifiers" (UID's) used for patient de-duplication (aka "Entity Resolution", aka "Record Linkage").

The current implementation is using two CSV files as input for two separate scripts as described in the diagram below.

Note: The hashing process insures that "OneFlorida Domain" WILL NOT RECEIVE any data containing PHI.

    +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    |   Partner Domain
    |
    |    (CSV file with PHI)                                (CSV file with no PHI)
    |   +--------------------------+                       +--------------------------+
    |   |   PHI_DATA.csv           | ----> hasher.py ----> |    HASHES.csv            |
    |   | patid, first, last,      |                       | patid, F_L_D_S, F_L_D_R  |
    |   | dob, sex, race           |                       |                          |
    |   +--------------------------+                       +--------------------------+
    |                                                            ||
    +- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - || - - - - - -
    |   OneFlorida Domain                                        \/
    |                                                       +--------------------------+
    |                                                       | OneFlorida SFTP Server   |
    |                                                       +--------------------------+
    |                                                            ||
    |                                                            ||
    |                                                            \/
    |                                                       +--------------------------+
    |                                                       |   HASHES.csv             |
    |                                                       | patid, F_L_D_S, F_L_D_R  |
    |                                                       +--------------------------+
    |                                                            |
    |      ____________                                          |
    |    /              \                                        |
    |   |               /|                                      /
    |   |\_____________/ |                                     /
    |   |              | |  <------------- linker.py <--------
    |   |  UF Database | |
    |   |              |/
    |    \_____________/
    |
    |       (Links between hashes -> UUID's)
    |                                                             _____   O
    |       patid, partner_code, linkage_uuid, linkage_hash      / /     -+-
    |         123,          UFH,       abc...,       def...   <-- /       |
    |         456,          FLM,       abc...,       def...   <--        / \
    |         789,          FLM,       987...,       012...
    |
    |    (generate UID's from hashes)
    |
    + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Note on PHI: The hasher.py script uses the python implementation of the sha256 algorithm to scramblme the PHI in order to make it imposible to re-identify the patients. The sha256 algorithm is certified by the National Institute of Standards and Technology (NIST)

Installation

The two components of the application (hasher, linker) need proper configuration in order to function. For more details please refer to the docs/installation.md and dosc/installation-linker.md.

The format for the input file for the hasher component is described in the input-specs.md document.

References

NIST Secure Hash Standard - nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf
CAPriCORN: Chicago Area Patient-Centered Outcomes Research Network - https://www.ncbi.nlm.nih.gov/pubmed/24821736
http://infolab.stanford.edu/serf/
"Swoosh: A Generic Approach to Entity Resolution" - http://link.springer.com/article/10.1007%2Fs00778-008-0098-x

ufbmi/onefl-deduper

OneFL Deduper

Intro

Installation

References