DiamondLightSource/cothread

Malformed UTF-8 triggers unexpected exception

Araneidae opened this issue · 4 comments

Calling caget on Python3 on a PV which returns a string which is not in UTF-8 format triggers an exception which is not reported to the caller. For example:

>>> from cothread.catools import *
>>> caget ('LI-RF-AMPL-01:KLY:T1')
19.278066635131836
>>> caget ('LI-RF-AMPL-01:KLY:T1', format=FORMAT_CTRL)
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 232, in 'calling callback function'
  File "/scratch/hgs15624/local/venvs/burtinter-dhuzAW7m/lib/python3.7/site-packages/cothread/catools.py", line 587, in _caget_event_handler
    args.raw_dbr, args.type, args.count))
  File "/scratch/hgs15624/local/venvs/burtinter-dhuzAW7m/lib/python3.7/site-packages/cothread/dbr.py", line 829, in dbr_to_value
    raw_dbr.copy_attributes(result)
  File "/scratch/hgs15624/local/venvs/burtinter-dhuzAW7m/lib/python3.7/site-packages/cothread/dbr.py", line 244, in copy_attributes_ctrl
    other.units = py23.decode(ctypes.string_at(self.units))
  File "/scratch/hgs15624/local/venvs/burtinter-dhuzAW7m/lib/python3.7/site-packages/cothread/py23.py", line 56, in decode
    return s.decode('UTF-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 0: invalid start byte
Traceback (most recent call last):
  File "/scratch/hgs15624/local/venvs/burtinter-dhuzAW7m/lib/python3.7/site-packages/cothread/catools.py", line 155, in ca_timeout
    return event.Wait(timeout)
  File "/scratch/hgs15624/local/venvs/burtinter-dhuzAW7m/lib/python3.7/site-packages/cothread/cothread.py", line 768, in Wait
    self._WaitUntil(deadline)
  File "/scratch/hgs15624/local/venvs/burtinter-dhuzAW7m/lib/python3.7/site-packages/cothread/cothread.py", line 611, in _WaitUntil
    raise Timedout('Timed out waiting for event')
cothread.cothread.Timedout: Timed out waiting for event

There are two issues here:

  1. The UnicodeDecodeError occurs during a ctypes wrapped callback, and so is not reported to the caller, hence the second Timedout exception. This is straightforward to fix.
  2. The current policy of cothread of treating all strings as UTF-8 on Python 3 leads to increasing brittleness in the presence of strings using other encodings (Latin-1 in this case).

One option is to use decode('UTF-8', 'ignore') or decode('UTF-8', 'replace'), but this would disguise errors that perhaps should be treated as exceptions.

@willrogers , @thomascobb , I'm interested in your thoughts on this.

I've implemented decode('UTF-8', 'replace') in commit 17920a4 which fixes this.

This seems like a reasonable solution to me.