Project-MONAI/monai-deploy-informatics-gateway

Generate study & series level JSON files

Opened this issue · 5 comments

Is your feature request related to a problem? Please describe.

The Workflow Manager needs the series or the study level DICOM tags available for filtering. However, MIG currently provides only instance-level DICOM models.

Describe the solution you'd like

Provide a [study-instance-uid].json and a [series-instance-uid].json file for each study/series with study/series level tags respectively.

For private tags, we have a few options:

  1. Let the user define and register the private tags at each level so the MIG knows which tags to include at each level.
  2. For each level, scan all private tags, if values are all the same across all instances for the level, it is assumed the tag is associated with the level.
  3. Simply include all private tags at all levels, if values are different, change VM to multiple.
  4. Output a single JSON that includes all instances for that payload.
  5. Have a workflow step handle the filtering (out of scope)
  6. Read the first JSON file from instance level given that attributes are the same across study-level & series-level. (unless there are multiple series, aka grouping by patient or study)
  7. Parse workflow definition and find out which tags are needed, register them with MIG to parse them.
    e.g.
study-instance-uid-1:
   series-instance-uid-1:
      instance-instance-uid-1:
         slice-thickness: 5mm
   series-instance-uid-2:
      instance-instance-uid-2:
         slice-thickness: 3mm

Configuration

Allow configuration override to turn on/off JSON generation at each level (study, series, instance).

Notes

Even though most DICOM libraries include a list of all DICOM tags/attributes, none of them provides information on which tags are at the patient/study/series levels, we may need to define a dictionary and allow users to customize it. Moreover, attributes may be different across different modalities. E.g (0012,0050) Clinical Trial Time Point ID Attribute is a study level attribute that exists in CR but not for CT.

The output of the current instance-level JSON files can be configured through DicomConfiguration.WriteDicomJson with the options to either export all attributes or non-binary (others).

For standards defined DICOM tags, it is very much defined across all OID classes, e.g. in the Patient Module, Series modules, but for sure, specific OIDs will have more specific tags added. So it is reasonable to collect and consolidate a basic set of Study and Series level tags.

For private tags, for sure, no one, except the creators, knows the semantics, unless details are shared with the consumers, for the obvious reason being private.

For private tag parsing options:

  • Option 1 will not work or will not be practical, because given the many manufacturers and varieties of modalities, knowing and understanding the private tags from all modality is NOT trivial. Private tags are meant to be consumed only by the compatible producers and consumers, and often times products from the same manufactures cannot be compatible. Some DICOM network add certain installation specific DICOM private tags to the already acquired instances, but it IS NOT compliant as DICOM instances once created are immutable for legal (and clinical) reasons, and any change will cause a new instance/series being created.
  • Option 2 as an implementation technique can work, but, for its name sake, Privat Tags are private, and there is no explicit definition of them being at Study/Series or instance level. What are the use cases, and who is expected to consume the private tags? So far, it is not clear and should be the first thing to be defined.
  • Option 3 should be careful with binary value reps, as from experience, some modalities embedded large binary data as private tag (I am guilt as well for embedding STL binary since the target PACS client cannot support DICOM STL OID, and Hologic was especially bad initially with embedding hundreds of MB to GB of Tomo data in private tags.

One other option is to let the evaluators in WM support publicly known tags and have a separate app in the workflow steps to read and filter using the private DICOM tags.

Hologic was especially bad initially with embedding hundreds of MB to GB of Tomo data in private tags.

🤣 yes, I remember the days of converting what we called Tomo SCOs into Tomo IOD standards.

The naught use of the private tag for large binary data by Hologic was discovered when I was doing Rad Analytics, using DICOM tags as part of the data source, and inspecting all tags with binary VR. The PACS, both server and client, do not care as it is just part the dataset one way or the other, but for apps that intend to be light and fast using metadata, large binary data is way too much trouble, and often times of no use, the very reason the tag PixelData is often times filtered out on loading DCM.

By the way, I actually had developed the hierarchical Study/Series/Instance metadata json object and its serialization in the other Deploy project; you know which. It is straightforward once you organize the instance into the logical containment hierarchy, and know what attributes are in the Study Module/Patient Module/Patient Study Module/General Series Modules etc. One of the key thing is to know what type the attribute is of, and then deal with absence and blanks.

Performance Comparison:
MIG:

  1. To create study-level/series-level, IG must know which attributes belong to which level (more configuration, less processing) OR
  2. Compare all values at each level and assume the attribute is at that level if values are the same (no configuration, more processing)

WM

  1. Has to download, read & parse through all JSON metadata at the instance level