precision-medicine-toolbox is an open-source python package for medical imaging data preparation for data science tasks. This package is aimed to provide a tool to curate the imaging data and to perform exploratory feature analysis.
If you are using this toolbox, please, cite the original paper:
Primakov, Sergey, Elizaveta Lavrova, Zohaib Salahuddin, Henry C. Woodruff, and Philippe Lambin. "Precision-medicine-toolbox: An open-source python package for facilitation of quantitative medical imaging and radiomics analysis." arXiv preprint arXiv:2202.13965 (2022).
Currently, the toolbox has the following functionality:
- Dataset exploration. This function gets the specified metadata from the DICOM files of the dataset and allows for exploration of the diversity degree of the imaging parameters..
- Dataset quality check. This function checks every scan in the dataset to be in line with the pre-defined requirements:
- imaging modality is correct,
- slice thickness is in the acceptable range,
- number of slices is in the acceptable range,
- all the slices have a target plane resolution,
- in-plane pixel spacing is in the acceptable range,
- reconstruction kernel for CT data is presented and is acceptable.
- Conversion of DICOM to NRRD. This function allows for the conversion of DICOM (CT or MR) dataset into volume (NRRD format) dataset. 2D data is temporarily not supported.
- Basic image pre-processing. This function performs basic image pre-processing steps, selected by the user; the following methods are available:
- N4 bias field correction,
- intensity rescaling, based on fat values or percentile values,
- histogram matching,
- intensities resampling,
- histogram equalization,
- Z-scoring, based on defined normalization coefficients or image-based values,
- image reshaping.
- Unrolling NRRD images & ROI masks into jpeg slices. This function could be used for a quick check of the converted images or any existing NRRD/MHA dataset. It will generate the JPEG images for each ROI slice.
- Extracting of radiomics features. Feature extraction procedure using pyradiomics to obtain the radiomics features for NRRD/MHA dataset.
- Basic analysis of radiomics features. Export to Excel file of features basic statistics and statistical tests values and visualization (in .html report) of:
- features values distributions in binary classes,
- Shapiro-Wilk test for normality check,
- features mutual correlation (Spearman) matrix,
- p-values (corrected) for Mann-Whitney test for features mean values in groups,
- univariate ROC-curves for each feature,
- volumetric analysis: volume-based precision-recall curve + features correlation with volume.
- Binary classification metrics reporting. Given true labels and predicted probabilities, this function performs:
- classification metrics reporting,
- confusion matrices and ROC curves plotting.
Warning! Not intended for clinical use!
precision-medicine-toolbox is an open-source package, the source code is available online. The online documentation is available here. The functionality of the toolbox is illustrated in the tutorial notebooks.
Our package is using the existing quality tools for the key steps:
- pydicom (DICOM I/O),
- SimpleITK (image I/O and pre-processing),
- pyradiomics (features extraction).
See requirements.txt for more.
Before use, install the dependencies from the requirements file:
pip install -r requirements.txt
Then clone repository with the git client of your preference.
The latest version is also available at PyPi:
pip install precision-medicine-toolbox
The following example illustrates how to initialize an object of a dataset class:
import os, sys
sys.path.append('path to precision-medicine-toolbox directory')
from pmtool.ToolBox import ToolBox
# set up parameters for your imaging dataset
parameters = {'data_path': 'root directory of the imaging data',
'data_type': 'dcm', # DICOM data
'multi_rts_per_pat': False # looks at 1 RTStruct/patient only
}
my_dataset = ToolBox(**parameters)
You can contribute to this package at our GitHub by:
- reporting the issues,
- giving us feedback for the code and the documentation via suggestions/comments:
- directly in the Pull request,
- writing and leaving a comment in the Conversation tab,
- sending an e-mail to authors.
Initial and main developers:
- Sergey Primakov @primakov
- Lisa Lavrova @lavrovaliz
Also you can see the list of the contributors.
This project is licensed under the BSD-3-Clause License (see the LICENSE for the details).
The authors would like to thank:
- the Precision Medicine department colleagues for their support and feedback,
- Mart Smidt for testing the tool on the different data,
- external users for the feedback,
- PyRadiomics for a reliable open-source tool for features extraction,
- Hugo Aerts et al. for the Lung1 dataset we used to demonstrate our functionality and The Cancer Imaging Archive for the publically available data.