GehirnInc/python-jwt

assert isinstance(..., str) differs in Python 2.x and Python 3.x

svisser opened this issue · 3 comments

The various assertions in the codebase that assert isinstance(..., str) will behave differently in Python 2.x and Python 3.x. The assertions may work now on tested examples but their behaviour is different. In Python 2.x it'll assert that a sequence of bytes is passed in wheres in Python 3.x it'll assert a Unicode string is passed in. So these assertions should be adapted to use basestring instead.

I may fixed this issue at efc05d1. Could you confirm my fix?

I can confirm that the tests pass in 2.7, 3.3 and 3.4. I think your changes do solve the issue of the assertions being incorrect. It may be worth exploring if you can move the encode() and decode() calls a bit further to the application boundary (rather than encoding/decoding things throughout your code).

I'm not saying this can be cleanly done here but that's at least the philosophy on keeping Unicode/bytes separation clean in many applications (so: bytes come in, convert to Unicode, use that everywhere, and convert back to bytes if needed when data needs to leave your application).

If you do wish to include assertions on strings in the future, it would be:

if sys.version_info[0] >= 3:
    basestring = str
else:
    basestring = basestring

if isinstance(obj, basestring):
    ...

I will merge efc05d1 to master for now.

I think that encode() and decode() are called at the application boundary as you are saying.

I intend that the library users are used only encode, decode, and/or verify of jwt.jwt.JWT. verify expects to receive bytes or str and returns bool, decode expects to receive bytes or str and returns str, encode expects to receive only bytes and returns str. Reason of that encode can receive only bytes is JWS/JWE payload is defined as the sequence of octets.

Would you tell me where do you think dirty implements?