Allow processing of data before export

Question

Allow processing of data before export

joshliberty opened this issue 2 years ago · 4 comments

Sometimes an export destination will require data to undergo some processing before it is sent to it. It would be useful to be able to write plug-ins that have the opportunity to edit export data before it is being sent to the export destination. It should then be possible to configure which plug-ins are active for which export destination (and maybe data type).

Note, it is important that the plug-in has access as part of the same instantiation to all data that needs to be exported as part of this export request. This is so that it can use the same study ID for all items, the same series ID for all items within each series, and so on.

Need
This is required because the internal PACS system used within AIDE requires IDs to be randomised, so that different executions where different data for the same patient is sent don't end up being aggregated.

Describe alternatives you've considered
There are two other options:

Doing this on the receiving side. For AIDE, this means having a DICOM proxy between the Informatics Gateway and the internal PACS system. This will not be as effective, because it will be a lot of duplication of effort, or a compromise in quality, because we'll lose the functionality that the Informatics Gateway has around retries on failure handling.
Doing this within the workflow. This means creating another task plug-in which can edit the data in this way. The downside of this approach is that all data will need to be copied, causing a duplication of data-in the case of large series, this will be significant. The internal AIDE PACS system expects to receive the original data as well as any output from AI applications, meaning that a lot of data will be duplicated.

The benefit of doing this in the Informatics Gateway is that it is possible to edit the DICOM series while they are already in memory, between reading them from desk and before sending them to the DICOM destination. This will offer the best performance.

Answer 1 · 2022-06-17T07:46:42.000Z

It is not clear what exactly needs to be change in the (DICOM) instance to be exported by the IG.
It is also not clear what the "IDs" are in "the internal PACS system used within AIDE requires IDs to be randomised". By the way, newly created DICOM series and instance UID are unique per standards.

A new DICOM series, with a newly generated DICOM instance UID, has to be generated in the DICOM output as the result of processing the original DCIOM Study/Series, even though the Study Instance UID, accession number, patient modules etc are reused. What exactly mixed-up is AIDE concerned about?

DICOM instances are immutable after creation, per se, illegally and clinically, although they can technically be modified due to the lack of temper proof mechanism. The idea of making IG to support plug-in's which modify the to-be exported DICOM instances cause many concerns:

Should it even allow the output DICOM instances to be modified once created by the AI applications?
Performance degradation. Will the IG have to send all instances from all workflows to all potential plug-ins? or there are conditional dispatching rules? What are the requirements to create the rule?
Deploying/configuring a plug-in at IG is cumbersome and hard for dev time testing.

Answer 2 · 2022-06-20T14:36:22.000Z

@MMelQin the need for this came about by a requirement for AIDE to isolate data between different invocations. The reason is that data may be poorly anonymized, meaning that various datasets will be received where the study ID, series ID, or patient ID may be similar to previous data that has been received – even though it's not really the same data. Because AIDE stores everything on an internal PACS system, this results in collisions- PACS systems group things using these ids.

The proposed feature above allows solving this by appending a randomly generated string to the end of the study ID, series ID and patient ID, thereby making sure data is stored separately on PACS. This will only be triggered on demand, based on configuration which matches specific export destinations with specific processing plug-ins. It shouldn't impact performance unless actually used.

@hshuaib90 might be able to explain more about why this is needed

Answer 3 · 2022-08-10T15:20:26.000Z

@joshliberty do we still need this?

Answer 4 · 2022-09-19T16:37:34.000Z

Closing; please reopen if we have detailed requirements