Some ad hoc utilities for analyzing FHIR resources. Requires msgpack
and tqdm
packages.
The following scripts use msgpack
for faster loading of resources (only a few seconds on 1.5M resources). Resources are assumed to be grouped into directories, one resource per JSON file. pack.py
is used to pack these resources into a single msgpack
-formatted file.
To pack all resources in a directory named CARRIER
:
python pack.py CARRIER
Searches all packed resources for all instances of a given key, and returns all found paths along with unique values for the key (and companion key).
To find all instances of system
and see found system
/code
pairs in CARRIER
resources:
python search.py system code CARRIER
Performs some simple analysis of the structure and values within a pack of resources.
To see all requried and optional properties of extension
elements within item
elements on CARRIER
resources:
$ python analyze.py -d INPATIENT -p item.extension
Required:
- url (71)
- valueQuantity (71)
Optional:
To see all encountered url
values in extension
elements of the base resource of INPATIENT
resources:
$ python analyze.py -d INPATIENT -p extension -a url
Required:
- https://bluebutton.cms.gov/resources/variables/clm_pass_thru_per_diem_amt (5071)
- https://bluebutton.cms.gov/resources/variables/clm_pps_cptl_dsprprtnt_shr_amt (5071)
- https://bluebutton.cms.gov/resources/variables/clm_pps_cptl_excptn_amt (5071)
- https://bluebutton.cms.gov/resources/variables/clm_pps_cptl_fsp_amt (5071)
- https://bluebutton.cms.gov/resources/variables/clm_pps_cptl_ime_amt (5071)
- https://bluebutton.cms.gov/resources/variables/clm_pps_cptl_outlier_amt (5071)
- https://bluebutton.cms.gov/resources/variables/clm_pps_old_cptl_hld_hrmls_amt (5071)
- https://bluebutton.cms.gov/resources/variables/clm_tot_pps_cptl_amt (5071)
- https://bluebutton.cms.gov/resources/variables/fi_num (5071)
- https://bluebutton.cms.gov/resources/variables/nch_bene_blood_ddctbl_lblty_am (5071)
- https://bluebutton.cms.gov/resources/variables/nch_bene_ip_ddctbl_amt (5071)
- https://bluebutton.cms.gov/resources/variables/nch_bene_pta_coinsrnc_lblty_amt (5071)
- https://bluebutton.cms.gov/resources/variables/nch_drg_outlier_aprvd_pmt_amt (5071)
- https://bluebutton.cms.gov/resources/variables/nch_ip_ncvrd_chrg_amt (5071)
- https://bluebutton.cms.gov/resources/variables/nch_ip_tot_ddctn_amt (5071)
- https://bluebutton.cms.gov/resources/variables/nch_profnl_cmpnt_chrg_amt (5071)
- https://bluebutton.cms.gov/resources/variables/prpayamt (5071)
Optional:
- https://bluebutton.cms.gov/resources/variables/clm_mdcr_non_pmt_rsn_cd (78)
- https://bluebutton.cms.gov/resources/variables/dsh_op_clm_val_amt (2869)
- https://bluebutton.cms.gov/resources/variables/ime_op_clm_val_amt (2104)
Performs larger-scale analysis of the structure within multiple packs of resources. The result is a JSON tree structure with statistics for each property stratified by profile, with along unique (scalar) values of system
, code
, and url
. See summary.json for the output. The script utilizes multiple cores with multiprocessing
.
$ python tree.py CARRIER INPATIENT PDE > summary.json
The --count
option is useful when debugging to limit how many resources are analyzed per profile type.
Processes the output of tree.py
to produce output for a given profile containing the tree of properties along with observed cardinalities. Observed values of system
, code
, and url
from tree.py
are output if there are between 1 and 25 of such values to capture common coding systems and extensions. See CARRIER_profile.txt, PDE_profile.txt and INPATIENT_profile.txt for the output.
$ cat summary.json | python extract_profile.py CARRIER > CARRIER_profile.txt
The resulting files were manually edited to remove uninteresting collections of values, and then run through an online unicode tree generator for beautification. The results can be found at CARRIER_tree.txt, PDE_tree.txt and INPATIENT_tree.txt.