Add Python 3 support for apostrophes in tag values
r3m0chop opened this issue · 4 comments
As a part of work towards #27 ("Add support for Python 3"), noticed that when a tag value contains a single-quote apostrophe, and when def()
attempts to save the data via self.cursor.execute(GET_LINKED % table_name, (original, study_uid_pk))
, that sqlite3 complains of sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
For example, with the ReferringPhysiciansName
tag (0008,0090) containing a value of Referring Physician's Name 4
, the following traceback is shown:
Traceback (most recent call last):
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
yield
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
callback(self, data_element) # self = this Dataset
File "dicom_anon.py", line 459, in clean_cb
if self.enforce_profile(ds, e, study_pk):
File "dicom_anon.py", line 486, in enforce_profile
cleaned = self.basic(ds, e, study_pk)
File "dicom_anon.py", line 507, in basic
prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
File "dicom_anon.py", line 194, in get
self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dicom_anon.py", line 812, in <module>
da.run(i_dir, c_dir)
File "dicom_anon.py", line 700, in run
ds, study_pk = self.anonymize(ds)
File "dicom_anon.py", line 637, in anonymize
ds.walk(partial(self.clean_cb, study_pk=study_pk))
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1779, in walk
dataset.walk(callback)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 37, in tag_in_exception
raise type(ex)(msg)
sqlite3.InterfaceError: With tag (0008, 0090) got exception: Error binding parameter 0 - probably unsupported type.
Traceback (most recent call last):
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/tag.py", line 30, in tag_in_exception
yield
File "/Users/williamsrms/.virtualenvs/locutus3/lib/python3.7/site-packages/pydicom/dataset.py", line 1773, in walk
callback(self, data_element) # self = this Dataset
File "dicom_anon.py", line 459, in clean_cb
if self.enforce_profile(ds, e, study_pk):
File "dicom_anon.py", line 486, in enforce_profile
cleaned = self.basic(ds, e, study_pk)
File "dicom_anon.py", line 507, in basic
prior_cleaned = self.audit.get(e, study_uid_pk=study_pk)
File "dicom_anon.py", line 194, in get
self.cursor.execute(GET_LINKED % table_name, (safer_original_str, study_uid_pk))
sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type.
Please note that this apostrophe issue does not seem to occur when run within a Python 2 environment, and is only reveled while running in a Python 3 environment, for whatever reason. This could be related to the particular import sqlite3
being utilized, or to the psycopg2-binary v2.83 dependency that is used under Python 3 but seemingly not used at all under Python 2.
An initial workaround to remove any potentially offending single-quote apostrophes can be as simple as the following:
safer_cleaned_str = "{0}".format(cleaned).replace("'","")
and:
safer_original_str = "{0}".format(original).replace("'","")
...wherever these are used in the relevant db.execute(...)
or self.cursor.execute(...)
calls to sqlite3.
Eventually, however, a full sqlite3-compatible escaping of these single-quote apostrophes would be even better, but first pass attempts at doing so still encountered the above error, so just removing them entirely for now.
Hold on - finding some valid data values in there which are already escaped and should NOT have their single-quote apostrophe removed, such as in:
2019-11-08 16:28:20,678 - dicom_anon - INFO - get() found apostrophe within original_STR value
of "b'\xd6\x01\xaf
[...] \xee\x07t\x07\'\x08\xde\x07\xe1
[...] \n\xae\t\'\n;\t\xa7\n0\t\xd7\t+\t\xd/\'
... further noting the initial b
"bytes literal" indicator prefacing the rest of the string in the above.
This might be a separate issue, but to at least begin capturing it, please note that some of the METADATA is resulting from dicom_anon
as such. For example, in a DICOM object that initially had no values for the AccessionNumber
and ReferringPhysiciansName
tags, the dicom_anon
-processed values ended up as bAccession Number 3
and b"Referring Physicians Name 6"
, respectively.
Much easier and straight-forward seems to be to:
- cast all such original and cleaned values via
str(...)
- remove the
encode(ascii)
ofcleaned = ('%s %d' % (e.name, self.audit.get_next_pk(e))).encode('ascii')
to instead usecleaned = str('%s %d' % (e.name, self.audit.get_next_pk(e)))
(as also referenced in #33, "Add --DB_delete option to clear out the sqlite3 tables of prior_cleaned")