chop-dbhi/dicom-anon

Support Pixel Anonimizers

cancan101 opened this issue · 12 comments

Allow plugging in a pixel anonymizer that blacks our the burned in annotations.
Ideally it would plug in here and look something like: https://github.com/johnperry/CTP/blob/master/source/files/scripts/DicomPixelAnonymizer.script

Hi, thanks for the suggestion. I don't have much experience with CTP, but I agree an option to plugin a preferred pixel anonymizer would be a nice feature. I think the option could go here right before it cleans out the headers so that we don't destroy data the pixel cleaner needs. Do you have any experience with scripts to do this?

Are you currently using the dicom-anon script?

I am looking to use it. Currently I have a Matlab script that does the anonimization, but I would prefer to move to Python. In my matlab script I blank out the burned in annotations.

Another Python implementation I found is: https://github.com/darcymason/pydicom/blob/dev/pydicom/examples/anonymize.py

You also want to remove the burned annotation and then set burned in to false so that the file does not get quarantined.

I have used that- I basically wrote this script to be a more extensive
version of that one.

On Tue, Mar 3, 2015 at 1:12 PM, Alex Rothberg notifications@github.com
wrote:

I am looking to use it. Currently I have a Matlab script that does the
anonimization http://www.mathworks.com/help/images/ref/dicomanon.html,
but I would prefer to move to Python. In my matlab script I blank out the
burned in annotations.

Another Python implementation I found is:
https://github.com/darcymason/pydicom/blob/dev/pydicom/examples/anonymize.py


Reply to this email directly or view it on GitHub
#3 (comment).

Good Point! I probably won't have time to properly dig into writing a pixel anonymizer in the near-term, but if you have something in MATLAB you would like convert to Python and contribute to the project we welcome any pull requests. I think the hard part is all the heuristics for identifying likely burnt-in data (and making that extendible), which you might already have (and it looks like the CTP script has a good start as well).

It has always been on my wish list to try to use some simple machine learning or OCR to look for text, or at least alert above a certain confidence.

I'd certainly be interested in helping integrate something if you contributed.

It looks like OB and OW VRs are being removed here:

if e.VR in ['PN', 'CS', 'UI', 'DA', 'DT', 'LT', 'UN', 'UT', 'ST', 'AE', 'LO', 'TM', 'SH', 'AS', 'OB', 'OW']:
which is the VR set on pixel data. This means the entire pixel data seems to be removed when "anonymizing".

So you gave me a heart attack on this one, but have tried it and seen it delete the pixel data? I think because of this line in pydicom

https://github.com/darcymason/pydicom/blob/master/source/dicom/_dicom_dict.py#L3706

it actually sets that VR string to "OB or OW" and it fails to match. Assuming this is preventing the problem for you, this is definitely not something it should rely on.

I'm not sure I follow what you are saying.

It looks like the VR string as presented by pydicom may be: 'OB or OW', 'OB' or 'OW'.

I have dealt with the issue for now:

def vr_handler(ds, e):
    if (e.VR in ['PN', 'CS', 'UI', 'DA', 'DT', 'LT', 'UN', 'UT', 'ST', 'AE', 'LO', 'TM', 'SH', 'AS', 'OB', 'OW'] and
        e.tag != PIXEL_DATA):
        del ds[e.tag]
        return True
    return False

Have you seen a situation where pydicom actually puts in the e.VR for the pixel data element the string "OW" or the string "OB"?

My question is that it looks like from file I linked to that PyDICOM sets that string to "OB or OW" so it won't match.

Here is an example from ipython examining a dicom file:

a[0x7fe0, 0x0010].VR
'OW or OB'

Definitely:

In [388]: ds = dicom.read_file("/Users/alex/Downloads/series (1).dcm")
ds[0x7fe0, 0x0010].VR

Out[388]: 'OB'

and after running the file through dcmdjpeg I see OW.

Look at that. Thanks for catching that.