Update 3D Slicer and QuantitativeReporting to use SCT codes
fedorov opened this issue · 18 comments
As discussed in #386, DICOM switched to a different version of SNOMED codes. dcmqi has been updated to use the new codes while writing new content, and also, where appropriate, to check for both SRT and SCT codes while loading existing content (when decision on interpreting the content is influenced by a SNOMED code).
Next we need to make sure that various other components of Slicer that either generate content that uses SNOMED codes, or relies on specific codes, are also updated. Specifically:
- JSON files defining the terminology for SegmentEditor should be replaced with the new versions that use SCT in place of SRT. I believe this file is used only for generating new content, so I do NOT think we need both SCT and SRT JSON files in Slicer.
- SegmentStatistics should be updated to use SCT codes.
- segmentation plugin in QuantitativeReporting should be updated to use SCT codes
@lassoan @chribaue any other items you can think of, any concerns?
We save codes in seg.nrrd files. If we load previously saved file and the old codes are not in the terminology json file anymore then it will be a problem (then it will not be possible to make any changes). If we just load and save a segmentation file and Slicer replaces all the codes with new ones, that could be a problem, too (although I don't know about any large projects that would heavily rely on these codes).
To address these issues, we would need to add a helper class for code conversion and comparison.
Are conversion tables available? Probably the best would be to have independent tables in json or csv format that this code converter class could parse (separately from the current terminology files).
When existing segmentations are loaded, my understanding was that the terminology dictionaries are not used. Instead, codes and color (if available) are initialized from the data. So if it is loaded and then saved, the same codes will be used, which I think is the correct behavior. I thought there is no matching of the codes loaded from the data into an entry in the terminology dictionary, did I miss something?
Are conversion tables available? Probably the best would be to have independent tables in json or csv format that this code converter class could parse (separately from the current terminology files).
For the most part, the mapping is available in pydicom, but there are some codes that are not included in that mapping, but are listed in the dcmqi dictionaries, which Afshin fixed manually (see discussions in #386 and related PR).
When existing segmentations are loaded, my understanding was that the terminology dictionaries are not used. Instead, codes and color (if available) are initialized from the data. So if it is loaded and then saved, the same codes will be used, which I think is the correct behavior. I thought there is no matching of the codes loaded from the data into an entry in the terminology dictionary, did I miss something?
These are all true. As I wrote above, the problem with missing codes is that you cannot modify the terminology: change color, displayed name, laterality, etc.
For the most part, the mapping is available in pydicom
This will not be sufficient for several reasons.
- Terminology is a basic infrastructure level feature and we don't rely on Python at such level.
- We need to have better control over this (users should be able to add conversions, etc.).
- We will probably need conversion features independent from DICOM (for example for anatomical atlases).
Probably a csv file would be the most compact format for a conversion table, but since everything else is in json, maybe we could come up with someting in json, too. We should be able to handle multiple equivalent codes, for the next time DICOM standard writers change their minds, but also for anatomical atlases.
As I wrote above, the problem with missing codes is that you cannot modify the terminology: change color, displayed name, laterality, etc.
Sorry, I do not understand this. The way I understand the process of loading is that a new terminology is populated from scratch and based on the content of the segmentation loaded, and is associated with the loaded segmentation. If one wants to modify the color, displayed name, etc, and select those from a terminology beyond what was loaded from a segmentation, it would need to be first selected from the existing terminologies that come with Slicer (red arrow below).
Yes. The inconvenience is that even if you just want to change the anatomic region, you would need to manually define the correct terminology from scratch (choose property type, modifier). It is a bit of extra work , you may make a mistake, etc.
If you want to change the anatomic region, you don't need to define the terminology from scratch - you just select an existing terminology from the list.
I feel I am still missing something, maybe we can discuss this at the next Slicer call on Tuesday?
Re pydicom:
Terminology is a basic infrastructure level feature and we don't rely on Python at such level.
We already rely on pydicom, which de facto is essential component of the Slicer infrastructure. This specific issue is very specific to DICOM. As aside, I believe pydicom also includes mapping from SCT/SRT to other codes, which may be available in some instances (e.g., NCIt, or RadLex). Those mappings are specifically defined in the DICOM standard. While I do agree that it makes sense to support mapping outside of DICOM capabilities, the specific mappings defined by DICOM are in a better place in pydicom.
We need to have better control over this (users should be able to add conversions, etc.).
Yes, but I don't think this should be mixed with the mapping that DICOM defines, which should be fixed and affected only by the updates to the standard.
We will probably need conversion features independent from DICOM (for example for anatomical atlases).
Maybe that should be in a separate component? We would need to either replicate mapping in Slicer, which is error-prone and will require extra maintenance, and will lead to duplication.
If you want to change the anatomic region, you don't need to define the terminology from scratch - you just select an existing terminology from the list.
Selecting an existing terminology is the step that I talk about - it is an extra step, which should not be necessary and while the user does it, he can make a mistake.
We already rely on pydicom
Both Python and DICOM are optional parts of Slicer (you can disable them using CMake switches). Terminologies are not optional. Terminologies cannot rely on some third-party Python package.
Yes, but I don't think this should be mixed with the mapping that DICOM defines,
It would be much more work to maintain separate mechanisms for "official" and "user" translations. We should just bundle a good translation table with Slicer and allow users to add more to it (probably in additional files).
Maybe that should be in a separate component? We would need to either replicate mapping in Slicer, which is error-prone and will require extra maintenance, and will lead to duplication.
We need to have mapping in Slicer that does not rely on pydicom or other optional components. To reduce duplication, DICOM module could retrieve translations from pydicom and set it in Slicer core.
It should really not be us to create or maintain these translation tables. Whoever had the authority to change the codes used in DICOM standard should provide translation in electronic format. We could use this translation file directly or convert automatically to the internal format (specified by DCMQI or Slicer).
By the way, why this change was needed? It is quite frustrating that such things can happen, as from an application developer point of view the change has zero value and significant cost. If we can expect such changes in the future then it will discourage us from relying on "standard" terminologies too much (they are not really standard if they can change in every couple of years).
Selecting an existing terminology is the step that I talk about - it is an extra step, which should not be necessary and while the user does it, he can make a mistake.
This step is how things are done in Slicer right now. Nothing would be changed in this regard.
I have to say that it is confusing that DICOM is optional, and terminologies are not. It would be interesting to build Slicer (and ITK) without DICOM, and see what happens. I am not sure who would benefit from disabling DICOM.
Getting back to the issue, I am confused as to how we should proceed.
By the way, why this change was needed? It is quite frustrating that such things can happen, as from an application developer point of view the change has zero value and significant cost. If we can expect such changes in the future then it will discourage us from relying on "standard" terminologies too much (they are not really standard if they can change in every couple of years).
Yes, it is frustrating, and this did not come easy, from what I understand. The way I understand it, the reason is that DICOM has been around for very long, and it depends on other components it does not control. SNOMED no longer maintains those SRT codes. You can see the details in DICOM CP-1850: ftp://medical.nema.org/medical/dicom/final/cp1850_ft_replacingsrtwithsct.pdf. This is not something that happens often, but continuing to use SRT was not an option, and that CP passed the community review. Think about vendors who were affected by this, and they apparently agreed this approach is the right way forward. I disagree it has zero value. The ultimate goal of using those codes in the first place is to be able to harmonize data and to be able to better interpret it. Using deprecated codes does not help with that.
I am not sure who would benefit from disabling DICOM.
This is a common use case in the context of custom Slicer application.
In the past, I even looked at making segmentation optional ... (Slicer/Slicer#849)
Well, DICOM is most definitely is not optional for dcmqi.
By zero value I mean that after everybody completes the migration then we end up at the same place - we are still interoperable with all the other DICOM-compliant software. Anyway, probably it's all SNOMED's fault and the DICOM user community had no other choice.
The PDF that you've linked refers to translation tables in electronic forms that can be used by automatic translators:
"An Annex is added to PS3.16 with complete list of DICOM-SNOMED subset and mapping from SNOMED ID to Concept ID in order
41 to provide a single location for implementers to obtain the mapping; this will be made available in various convenient forms. This
42 enables the use of a simple, context-free, automated translator that uses such a table to convert all SRT/SNM3/99SDM to SCT
43 usage within a DICOM object, and vice versa."
Do you know where these "various convenient forms" of the translation table are available?
Yes, it is in Appendix O referenced from #386.
Unfortunately, I realized I have a conflict for this coming Tue 10am, so I won't be able to join the Slicer community call to discuss this.
But, at least for now, I believe we should update JSONs in Slicer, so that when new segmentations are generated, they use the SCT codes. We could also consider adding SRT codes into the JSON files, so we have something like SRTCodeValue
side by side with SCT here: https://github.com/QIICR/dcmqi/blob/master/doc/segContexts/AnatomicRegionAndModifier-DICOM-Master.json#L7-L13.
I believe we should update JSONs in Slicer, so that when new segmentations are generated, they use the SCT codes
Agreed.
We could also consider adding SRT codes into the JSON files, so we have something like SRTCodeValue side by side with SCT here
Since there is no single coding scheme that we can universally use anymore, we need to implement a similar mechanisms that are used for managing strings with different encodings. Maybe it has been just a false hope that a single coding scheme will suffice and we would have implemented this infrastructure independently from DICOM's switch (to support private schemes and non-clinical schemes, such as anatomical atlases)
Since we need to support conversion between several coding schemes used for various purposes putting SRT codes into just segmentation terminology files would not suffice. For example, we need comparison and translation for all terms that are used in measurements. Having conversion tables independent from existing terminology json files will probably make the software implementation easier.
Fortunately, currently we don't do much (or any?) decisions based on code values yet, but we mainly just store and retrieve them. So, there is no immediate need to implement code translation/comparison functions. We can implement these when we want to start using these code values in logics.
After replacing the JSON files with their SCT counterparts, I did a comprehensive search on the whole code for SRT related codes. I believe there are some more occurrences of SRT codes in slicer. I put them all (along with their location) into this index.html file. Am I supposed to change them as well?
@afshinmessiah yes, in addition to JSON files, the codes in those components will need to be update:
- SegmentStatistics should be updated to use SCT codes.
- segmentation plugin in QuantitativeReporting should be updated to use SCT codes
@fedorov I've changed those files you mentioned. The following files also contain some SRT codes. What about them?
./slicer/src/Libs/MRML/Core/vtkMRMLSegmentationStorageNode.h
./slicer/src/Libs/vtkSegmentationCore/vtkSegment.cxx
./slicer/src/Modules/Loadable/Terminologies/Logic/vtkSlicerTerminologiesModuleLogic.cxx
./slicer/src/Modules/Loadable/Terminologies/Logic/vtkSlicerTerminologyCategory.h
./slicer/src/Modules/Loadable/Terminologies/Logic/vtkSlicerTerminologyType.h
Yes, all of those should be updated. Most of them are in the comments anyway, and none is defining the behavior of the application on reading/loading the content.