This project aims to extend the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to incorporate radiotherapy (RT) data from DICOM files. The goal is to standardize RT data representation within the OMOP framework, facilitating large-scale analysis and research in radiation oncology.
We have developed a Python script that serves as a foundation for extracting RT data from DICOM files and structuring it in a format compatible with our proposed OMOP CDM extension. The script currently:
- Reads DICOM files (including RT-specific files like RTPLAN, RTSTRUCT, and RTDOSE)
- Extracts relevant information into two main structures:
rt_occurrences
andrt_features
- Converts these structures into pandas DataFrames
- Optionally saves the results to CSV files
/
├── DICOM.py # Main Python script for DICOM processing
├── requirements.txt # Python dependencies
├── sample_output/ # Directory containing sample output CSV files
└── README.md # This file
- Ensure you have Python 3.7+ installed.
- Install required packages:
pip install -r requirements.txt
- Update the
directory
variable inDICOM.py
to point to your DICOM files. - Run the script:
python DICOM.py
- The full range of RT-specific data is not captured, especially for complex objects like RTPLAN and RTSTRUCT.
- Data is not loaded into an actual OMOP CDM database structure.
- Enhance data extraction for all RT modalities
- Develop OMOP concept mapping for RT-specific data
- Implement integration with existing OMOP tables
- Extend OMOP vocabulary for RT concepts
- Develop database integration for inserting data into OMOP CDM structure
- Implement data validation and error handling
- Create comprehensive documentation and testing suite
This project is part of a collaborative effort to standardize RT data within the OMOP framework.