oncoray/mirp

TCIA data usage policy compliance and suggestions for TCIA integration

kirbyju opened this issue · 13 comments

Hi, apologies for ignoring the issue template but I don't think my inquiry fits the mold. I have 2 things I'd like to raise with you all.

  1. It appears you're utilizing the Soft Tissue Sarcoma dataset from TCIA in your tutorial documentation. In order to comply with TCIA's Data Usage Policy you need to list the data citation in the tutorial as follows to provide attribution to the folks who published this data:

Vallières, Martin, Freeman, Carolyn R., Skamene, Sonia R., & El Naqa, Issam. (2015). A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities (Soft-tissue-Sarcoma) [Dataset]. The Cancer Imaging Archive. http://doi.org/10.7937/K9/TCIA.2015.7GO2GSKS

  1. Rather than storing a copy of the data you want to use for demonstration purposes on Github, it could be very useful if you were to show people how to grab data directly from TCIA using our APIs. I've created many tutorials for working with our APIs, but the REST API Download notebook is probably most relevant for this. I'd be more than happy to answer any questions or to work with you on building out documentation for mirp to simplify users' ability to apply your tool to our datasets.

Just FYI, I stumbled on to this repo because I was interested in running some of our existing TCIA segmentation data through a standardized radiomics pipeline. It seems silly to have all our users repeating this computational process when we could do it once and provide the results such that people can dive right into using the derived features. I am extra excited that this tool adheres to the IBSI guidelines. Please let me know if you'd be interested to discuss potential collaborations.

Thanks for opening this issue.

I have created the following tasks:

  • Update tutorial to include data citation.
  • Add tutorial on how to interact with the TCIA API, and the IDC API.

Just FYI, I stumbled on to this repo because I was interested in running some of our existing TCIA segmentation data through a standardized radiomics pipeline.

There is an open issue with processing some DICOM SEG files, which I will be looking into: #81 .

Thanks for the quick reply! I just remembered that https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb would also be very useful for you to review since it's wholly focused on getting segmentation data from TCIA. It also includes info about how you can (usually) find the related image series that was used to create a given segmentation (RTSTRUCT or SEG).

I took a stab at updating your tutorial to use tcia_utils to grab the data: https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_MIRP.ipynb. Let me know if you have any questions about how it works or suggestions for improvement. I'd love to extend the notebook a bit to cover a few of our other datasets and then advertise this to our user community, but I am clueless about what image pre-processing steps might be required for different datasets. I'm especially interested in showing how to do this with the RTSTRUCT segmentation data that's available for our CPTAC datasets that are discussed in https://github.com/kirbyju/TCIA_Notebooks/blob/main/CPTAC/CPTAC.ipynb.

Looks good! For the MIRP longform documentation, I am looking to create a new tutorial, since I want to keep the current tutorial relatively simple and straightforward, and focus mainly on MIRP. Can I use some of your code in that tutorial?

Also, I am currently not looking to add too many bells and whistles to the visualisation tool. The tool works builds upon the internal image representation: show method. Internally an InteractivePlot is created, which interacts with a matplotlib canvas. The hardest part behind show is preparing the image itself by converting DICOM (and other image formats) to the internal representation. Adding additional DICOM modalities is ongoing work.

Sure, feel free to use the code wherever you like.

Are there any specific resources (documentation? publications?) you could point me to that would help me determine which MIRP options I may need to utilize with extract_features() based on the results of extract_image_parameters()? I'm trying to decide whether I can realistically tackle that if I write some tutorial notebooks for specific TCIA datasets or if I should just put a big disclaimer saying that users should keep in mind that they may need to apply such parameters in order to get scientifically valid results.

Also, is there any possibility you'd consider updating MIRP to a license that does not require derivative works to use the same license? E.g. Apache or BSD? I'm just wondering if the current license might prevent MIRP from being offered as an extension in popular tools such as 3D Slicer.

Are there any specific resources (documentation? publications?) you could point me to that would help me determine which MIRP options I may need to utilize with extract_features() based on the results of extract_image_parameters()? I'm trying to decide whether I can realistically tackle that if I write some tutorial notebooks for specific TCIA datasets or if I should just put a big disclaimer saying that users should keep in mind that they may need to apply such parameters in order to get scientifically valid results.

I am afraid there is no general guidance on how image procession should be configured. It mostly involves some domain knowledge. As a rough guide, you can ask yourself the following questions:

  • Are there differences in pixel spacing and slice spacing within the cohort? If yes, then you should resample the scans to a common spacing.
  • Is the slice spacing much (say 3-5 times or more) larger than the actual or desired pixel spacing? Then, consider using a 2D approach.
  • Do image intensities carry the same meaning for different scanners (e.g. CT), or not (e.g. T1w-MR)? If not, consider normalising the images and using fixed bin number discretisation.
  • If image intensities do carry the same meaning for different scanners, and fixed bin size discretisation is used, consider specifying the lowest intensity of the initial bin by setting the resegmentation_intensity_range parameter.
  • The number of bins or bin size for discretisation should be chosen so that these result in somewhere 4 to 64 bins in the region of interest in the average scan.
  • Are imaging parameters suggestive of notably different scans, e.g., an 70 keV CT scan vs a 140 keV CT scan, or a PET image with a different tracer, or a major difference in uptake times? Then consider which scans are of interest for your analysis and exclude others. Of course, morphological features such as volume are mostly unaffected.

Also, is there any possibility you'd consider updating MIRP to a license that does not require derivative works to use the same license? E.g. Apache or BSD? I'm just wondering if the current license might prevent MIRP from being offered as an extension in popular tools such as 3D Slicer.

I am not at liberty to change the license. However, it shouldn't be problematic as long as these tools simply provide an interface with MIRP, including as an extension. The EUPL license does indeed carry over to derivative works (though these works can be relicensed under selected compatible licenses). However, unlike strong copyleft licences such as GPL3 the EUPL does not prevent linking the software by other products under different licenses. The following guidance is provided:

The EUPL refers to the laws of EU countries and is therefore interoperable. This means that all the interfaces of the covered software (the APIs, formats, data structures) can be freely copied and reproduced in other independent works in order to build interoperability, e.g. combining software distributed under the EUPL with any other software licensed differently, even under a proprietary licence. In such a combination or statically linked aggregation, every linked component will keep its primary licence, without any ‘viral effect’.

Thanks for the additional info. I don't think I saw this in the documentation, but is there a way to tell MIRP you only want to run a particular class of features (e.g. only do morphological) or only specific individual features?

You can select specific classes of features using base_feature_families, or in case of filtered images (response maps) using response_map_feature_families.

Computing individual features is currently not supported, and would require a rewrite of the feature computation part of the code.

Just FYI, I stumbled on to this repo because I was interested in running some of our existing TCIA segmentation data through a standardized radiomics pipeline.

There is an open issue with processing some DICOM SEG files, which I will be looking into: #81 .

This is a bit of a tangent, but the report of this user having some potential trouble with TCIA SEG data got me wondering how you handle RTSTRUCT "keyhole" issues similar to what was reported in SlicerRt/SlicerRT#171 and pyplati/platipy#244.

We encountered similar issues, but were able to resolve the issue. Actual conversion of the contour data to segmentation masks takes place in the convert_contour_to_mask method.

MIRP does the following:

  • Collect all contours for a specific region of interest from the RTSTRUCT file. Contours are stored separately as arrays of vertex points.
  • Aggregate those contours that belong to the same slice. These are internally still separate contours, but will be processed simultaneously after aggregation. This is done because in some RTSTRUCT files holes are represented by having multiple contours for the same region of interest.
  • Determine the set of lines and vertices from each of the separate contours in each slice -- MIRP doesn't draw lines between contours. This step happens in contour_to_grid_ray_cast.
  • Use ray casting to find where the segmentation mask is. This handled by poly2grid.

I hope this may provide some insight into how this works.

having some potential trouble with TCIA SEG data

I wasn't able to reproduce the SEG issues. In fact, SEG and RTSTRUCT produced the exact same segmentation masks.

Hi again, I put together https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_MIRP.ipynb with some updates to see if maybe that would be enough to help you check off your task for providing an example of using MIRP with TCIA APIs. I'd be happy to push it to your repo as a PR if you want, or I could host it where it is currently if that's preferable. You might want to double check some of the explanations related to setting certain parameters to make sure they still apply to the example data.

Hi Justin, thanks for doing this! If you could open a PR (to dev2.3.0), I will work on integrating it into the docs.