/blue_button_parser

Primary LanguageJavaScriptMIT LicenseMIT

BlueButtonParser

BlueButtonParser is a Ruby library to parse the free-text personal health data export from the VA’s MyHealthEVet portal into a Ruby hash.

BlueButton is the initiative from the U.S. Department of Veterans Affairs to convince personal health record providers to let users download their data. They have implemented the BlueButton initiative on their own MyHealthEVet portal by allowing their usres to export their data in a a human-readable free text file. However, this text file is not computationally readable: it contains almost no markup or delimiters for sections, keys, or values.

This BlueButtonParser library was created by reverse-engineering the one sample data file provided by the VA and creating some ad-hoc rules for how to parse the document. BlueButtonParser will attempt to find all the sections in the file, all the key-value pairs within that section, and even find collections of items within a section when applicable (e.g. the array of facilities in the section “TREATMENT FACILITIES”).

Example free text data (see full text)

----------------------------- DEMOGRAPHICS ----------------------------

Source: Self-Entered

First Name: ONE
Middle Initial: A
Last Name: MHVVETERAN
Suffix: 
Alias: MHVVET
Relationship to VA: Patient, Veteran, Employee

Gender: Male   Blood Type: AB+         Organ Donor: Yes

Date of Birth: 01 Mar 1948
Marital Status: Married
Current Occupation: Truck Driver

Example parsed data (JSON) (see full output)

"DEMOGRAPHICS": {
  "Source": "Self-Entered",
  "First Name": "ONE",
  "Middle Initial": "A",
  "Last Name": "MHVVETERAN",
  "Suffix": null,
  "Alias": "MHVVET",
  "Relationship to VA": "Patient, Veteran, Employee",
  "Gender": "Male",
  "Blood Type": "AB+",
  "Organ Donor": "Yes",
  "Date of Birth": "01 Mar 1948",
  "Marital Status": "Married",
  "Current Occupation": "Truck Driver"
}

Install

sudo gem install blue_button_parser

Usage

require 'blue_button_parser'

# parse your downloaded BlueButton data file
my_bb_file = File.read("test/data/blue_button_example_data.txt")
bbp = BlueButtonParser.new(my_bb_file)

# access the parsed data as a Hash
parsed_data_hash = bbp.data

# example: to see the list of sections parsed:
parsed_data_hash.keys
# => ["MY HEALTHEVET ACCOUNT SUMMARY", "VITALS AND READINGS", "HEALTH INSURANCE", ... ]

# example: to acccess the "MY HEALTHEVET ACCOUNT SUMMARY" section:
summary = parsed_data_hash["MY HEALTHEVET ACCOUNT SUMMARY"]
# => {"Authentication Facility Name"=>"SLC10 TEST LAB", "Authentication Date"=>"19 Aug 2010", ... }

Caveats

BlueButtonParser was reverse engineered based on the single sample data file provided by the VA (latest version: v12, 02 Dec 2011). Because there are no rules as to how the document should be formatted,

  • A later version of the BlueButton output could break the parser

  • An instance of BlueButton output for a second patient might have slightly different formatting–for example, line-wrapped values for fields that were not wrapped in the current reference file, or new fields that were not applicable to the patient in the current reference file

  • As new sections get added, there is no guarantee that they will conform to the formatting for other sections

To keep the BlueButtonParser up-to-date, the test data file (test/data/blue_button_example_data.txt) and expect parsed output (test/data/expected_json_output.js) should be updated every time a new version of the BlueButtonData file is released.

Note however that as far as I know, there is no formal notification that a new version of the sample data file has been released, so I guess developers will just need to be vigilant. :)

After updates, make sure the tests still work and any applicable new tests get added.

Prior work

Somebody took a stab at this in the past: rest-developer-edition.na8.force.com/BlueConverter

I believe the code for this example is found here: github.com/joshbirk/BlueConverter

BlueConverter great first pass implementation, but it needs a few corrections:

  • it doesn’t deal with multiple key-values on single line, e.g. “Gender: Male Blood Type: AB+ Organ Donor: Yes”

  • it doesn’t deal with values that wrap lines

  • it doesn’t recognize that some of the key-values in a section should be grouped into a collection of items, e.g. in the “TREATMENT FACILITIES” section is actually comprised of a series of facilities

  • it doesn’t parse the tables, e.g. the “VA Treating Facility” in the “MY HEALTHEVET ACCOUNT SUMMARY”

Contributing to BlueButtonParser

  • Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet

  • Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it

  • Fork the project

  • Start a feature/bugfix branch

  • Commit and push until you are happy with your contribution

  • Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Please try not to mess with the Rakefile, version, or history. If you want to have your own version, or is otherwise necessary, that is fine, but please isolate to its own commit so I can cherry-pick around it.

Copyright © 2012 PatientsLikeMe. See LICENSE.txt for further details.