redis/hiredis-py

Permit errors='surrogateescape' and others

candlerb opened this issue · 8 comments

Relates to commit #82

This introduces the errors attribute, but it rejects all values apart from a hard-coded few. In particular, errors='surrogateescape' should be allowed (PEP 383, introduced in python 3.1) but is blocked.

However there are other defined error handlers, plus it's possible to add more dynamically with register_error.

Therefore I think it's better not to perform any validation here, but allow the provided value to be passed through to bytes.decode. Unknown values are rejected as soon as you try to use them.

>>> b'foo\xff'.decode("utf-8", "surrogateescape")
'foo\udcff'
>>> b'foo\xff'.decode("utf-8", "flurble")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown error handler name 'flurble'
>>>

Or, if you must validate before use, then use codecs.lookup_error instead of a hard-coded list.

>>> codecs.lookup_error("surrogateescape")
<built-in function surrogateescape>
>>> codecs.lookup_error("flurble")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown error handler name 'flurble'

Fair point. Would you like to do this and send a PR?

Which behaviour do you want: don't validate the encoding, or validate it via codecs.lookup_error ? The former seems more natural to me, but is a minor behaviour change.

I prefer the latter one. It's the current behavior.
If we don't validate the coding in the Reader_init phase, it is possible that reader.gets will raise a LookupError. Maybe I'm a little pedantic but it's kind of an API change.

OK - can you tell me the correct way to call codecs.lookup_error from C ?

OK, done. However I'm not sure if my unit test test_decode_error_with_surrogateescape_errors will work for python2.7; it could be made conditional or removed entirely.

Fixed it

I have now added Py_DECREF on the temporary objects. Otherwise, if I add

fprintf(stderr, "codecs refs=%ld", (long)Py_REFCNT(codecs));

then I see the reference count increasing on each call.