chop-dbhi/dicom-anon

Add Python 3 support for apostrophes in tag values

r3m0chop opened this issue · 4 comments

As a part of work towards #27 ("Add support for Python 3"), noticed that when a tag value contains a single-quote apostrophe, and when def() attempts to save the data via self.cursor.execute(GET_LINKED % table_name, (original, study_uid_pk)), that sqlite3 complains of sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.

For example, with the ReferringPhysiciansName tag (0008,0090) containing a value of Referring Physician's Name 4, the following traceback is shown:

Traceback (most recent call last):
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
    yield
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
    callback(self, data_element)  # self = this Dataset
  File "dicom_anon.py", line 459, in clean_cb
    if self.enforce_profile(ds, e, study_pk):
  File "dicom_anon.py", line 486, in enforce_profile
    cleaned = self.basic(ds, e, study_pk)
  File "dicom_anon.py", line 507, in basic
    prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
  File "dicom_anon.py", line 194, in get
    self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dicom_anon.py", line 812, in <module>
    da.run(i_dir, c_dir)
  File "dicom_anon.py", line 700, in run
    ds, study_pk = self.anonymize(ds)
  File "dicom_anon.py", line 637, in anonymize
    ds.walk(partial(self.clean_cb, study_pk=study_pk))
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1779, in walk
    dataset.walk(callback)
  File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 37, in tag_in_exception
    raise type(ex)(msg)
sqlite3.InterfaceError: With tag (0008, 0090) got exception: Error binding parameter 0 - probably unsupported type.
Traceback (most recent call last):
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
    yield
  File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
    callback(self, data_element)  # self = this Dataset
  File "dicom_anon.py", line 459, in clean_cb
    if self.enforce_profile(ds, e, study_pk):
  File "dicom_anon.py", line 486, in enforce_profile
    cleaned = self.basic(ds, e, study_pk)
  File "dicom_anon.py", line 507, in basic
    prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
  File "dicom_anon.py", line 194, in get
    self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.

Please note that this apostrophe issue does not seem to occur when run within a Python 2 environment, and is only reveled while running in a Python 3 environment, for whatever reason. This could be related to the particular import sqlite3 being utilized, or to the psycopg2-binary v2.83 dependency that is used under Python 3 but seemingly not used at all under Python 2.

An initial workaround to remove any potentially offending single-quote apostrophes can be as simple as the following:

        safer_cleaned_str =  "{0}".format(cleaned).replace("'","")

and:

        safer_original_str =  "{0}".format(original).replace("'","")

...wherever these are used in the relevant db.execute(...) or self.cursor.execute(...) calls to sqlite3.

Eventually, however, a full sqlite3-compatible escaping of these single-quote apostrophes would be even better, but first pass attempts at doing so still encountered the above error, so just removing them entirely for now.

Hold on - finding some valid data values in there which are already escaped and should NOT have their single-quote apostrophe removed, such as in:

2019-11-08 16:28:20,678 - dicom_anon - INFO - get() found apostrophe within original_STR value 
of "b'\xd6\x01\xaf 
[...] \xee\x07t\x07\'\x08\xde\x07\xe1 
[...] \n\xae\t\'\n;\t\xa7\n0\t\xd7\t+\t\xd/\'

... further noting the initial b "bytes literal" indicator prefacing the rest of the string in the above.

This might be a separate issue, but to at least begin capturing it, please note that some of the METADATA is resulting from dicom_anon as such. For example, in a DICOM object that initially had no values for the AccessionNumber and ReferringPhysiciansName tags, the dicom_anon-processed values ended up as bAccession Number 3 and b"Referring Physicians Name 6", respectively.

Much easier and straight-forward seems to be to:

  • cast all such original and cleaned values via str(...)
  • remove the encode(ascii) of cleaned = ('%s %d' % (e.name, self.audit.get_next_pk(e))).encode('ascii') to instead use cleaned = str('%s %d' % (e.name, self.audit.get_next_pk(e)))
    (as also referenced in #33, "Add --DB_delete option to clear out the sqlite3 tables of prior_cleaned")

Resolved in #27 ("Add support for Python 3")