Transforms and harmonizes data from Genomic Data Commons (GDC), Cellosaurus cell-lines, and International Cancer Genome Consortium (ICGC) repositories into 🔥 FHIR (Fast Healthcare Interoperability Resources) format.
- from source
git clone repo
cd fhirizer
# create virtual env ex.
# NOTE: package_data folders must be in python path in virtual envs
python -m venv venv-fhirizer
source venv-fhirizer/bin/activate
pip install .
- Dockerfile
(sudo) docker build -t <tag-name>:latest .
(sudo) docker run -it --mount type=bind,source=<path-to-input-ndjson>,target=/opt/data --rm <tag-name>:latest
- Singularity
singularity build fhirizer.sif docker://quay.io/ohsu-comp-bio/fhirizer
singularity shell fhirizer.sif
Detailed step-by-step guide on FHIRizing data for a project's study can be found in the project's directory overview.
-
GDC
-
convert GDC schema keys to fhir mapping
-
generate fhir object models ndjson files in directory
Example run for patient - replace path's to ndjson files or directories.
fhirizer generate --name case --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/cases_key.ndjson
- to generate document reference for the patients
fhirizer generate --name file --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/files_key.ndjson
-
-
Cellosaurus
- Cellosaurus ndjson follows Cellosaurus GET API json format
fhirizer generate --name cellosaurus --out_dir ./projects/<my-project>/META --entity_path ./projects/<my-project>/<cellosaurus-celllines-ndjson>
-
ICGC
fhirizer generate --name icgc --icgc <ICGC_project_name> --has_files
initialize initial structure of project, case, or file to add Maps
fhirizer project_init
# to update Mappings run associated labels script ex ./labels/project.py
fhirizer case_init
fhirizer file_init
pytest -cov
Data directories included in package data:
- resources: data resources generated or used in mappings
- mapping: json data maps produced by fhirizer pydantic schema maps
fhirizer/
|-- fhirizer/
| |-- __init__.py
| |-- labels/
| | |-- __init__.py
| | |-- files.py
| | |-- case.py
| | └── project.py
| |
| |-- schema.py
| |-- entity2fhir.py
| |-- mapping.py
| |-- utils.py
| └── cli.py
|
|-- mapping/
| |-- project.json
| |-- case.json
| └── file.json
|
|-- resources/
| |-- gdc_resources/
| | |-- content_annotations/
| | |-- data_dictionary/
| | └── fields/
| └── fhir_resources/
|
|-- tests/
| |-- __init__.py
| |-- unit/
| | |-- __init__.py
| | └── test_mapping.py
| |-- integration/
| | |-- __init__.py
| | |-- test_generate.py
| | └── test_convert.py
| └── fixtures/
|
|-- projects/
| └── GDC/
| | └── TCGA-STUDY/
| | |-- cases.ndjson
| | |-- filess.ndjson
| | └── META/
| └── ICGC/
| └── ICGC-STUDY/
| |-- data/
| └── META/
|--README.md
└── setup.py