/fhirizer

Primary LanguagePython

fhirizer

Status License: MIT

mapping

Project overview:

Transforms and harmonizes data from Genomic Data Commons (GDC), Cellosaurus cell-lines, and International Cancer Genome Consortium (ICGC) repositories into 🔥 FHIR (Fast Healthcare Interoperability Resources) format.

  • GDC study simplified FHIR graph

mapping

Usage

Installation

  • from source
git clone repo
cd fhirizer
# create virtual env ex. 
# NOTE: package_data folders must be in python path in virtual envs 
python -m venv venv-fhirizer
source venv-fhirizer/bin/activate
pip install . 
  • Dockerfile
(sudo) docker build -t <tag-name>:latest .
(sudo) docker run -it  --mount type=bind,source=<path-to-input-ndjson>,target=/opt/data --rm <tag-name>:latest
  • Singularity
singularity build fhirizer.sif docker://quay.io/ohsu-comp-bio/fhirizer
singularity shell fhirizer.sif

Convert and Generate

Detailed step-by-step guide on FHIRizing data for a project's study can be found in the project's directory overview.

  • GDC

    • convert GDC schema keys to fhir mapping

    • generate fhir object models ndjson files in directory

      Example run for patient - replace path's to ndjson files or directories.

    fhirizer generate --name case --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/cases_key.ndjson
    
    • to generate document reference for the patients
    fhirizer generate --name file --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/files_key.ndjson
    
  • Cellosaurus

     fhirizer generate --name cellosaurus --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/<cellosaurus-celllines-ndjson>
    
  • ICGC

    • NOTE: Active site and data dictionary updates from ICGC DCC to ICGC ARGO is in progress.
     fhirizer generate --name icgc --icgc <ICGC_project_name> --has_files
    

Constructing GDC maps cli cmds

initialize initial structure of project, case, or file to add Maps

fhirizer project_init 
# to update Mappings run associated labels script ex ./labels/project.py 

fhirizer case_init 
fhirizer file_init 

Testing

pytest -cov 

fhirizer structure:

Data directories included in package data:

  • resources: data resources generated or used in mappings
  • mapping: json data maps produced by fhirizer pydantic schema maps

fhirizer/
|-- fhirizer/
|   |-- __init__.py
|   |-- labels/
|   |   |-- __init__.py
|   |   |-- files.py
|   |   |-- case.py
|   |   └── project.py
|   |   
|   |-- schema.py
|   |-- entity2fhir.py
|   |-- mapping.py
|   |-- utils.py
|   └── cli.py
|   
|-- mapping/
|   |-- project.json
|   |-- case.json
|   └── file.json
|  
|-- resources/
|   |-- gdc_resources/
|   |   |-- content_annotations/
|   |   |-- data_dictionary/
|   |   └── fields/
|   └── fhir_resources/
| 
|-- tests/
|   |-- __init__.py
|   |-- unit/
|   |   |-- __init__.py
|   |   └── test_mapping.py
|   |-- integration/
|   |   |-- __init__.py
|   |   |-- test_generate.py
|   |   └── test_convert.py
|   └── fixtures/
| 
|-- projects/
|   └── GDC/ 
|   |     └── TCGA-STUDY/
|   |           |-- cases.ndjson
|   |           |-- filess.ndjson
|   |           └── META/
|   └── ICGC/
|         └── ICGC-STUDY/ 
|                |-- data/
|                └── META/
|--README.md
└── setup.py