Handling unicode isbn for _isbn_cleanse

Question

Handling unicode isbn for _isbn_cleanse

Closed this issue 11 years ago · 6 comments

When try to replace the dash of the unicode isbn, it will raise a UnicodeDecodeError, is it ok to convert the isbn to str inside the _isbn_cleanse rather than let the users do it themselves?

I knew you have forced the input must be str type, but I think it will make the api much flexible to support various type string. (I mean str and unicode)

bcho commented 11 years ago

LGTM ⭐

Answer 1 · 2013-10-13T17:36:46.000Z

Bah, Python 2 unicode/str... It wasn't my intention to stop you using Unicode types as input, I am just too stupid to have seen this coming with the recent changes :/

Converting to a native string type inside _isbn_cleanse() is a little ugly, because of the need for Python 2 and 3 support. I'll give it some more thought overnight, and hopefully have a reasonable fix tomorrow. That said, I'm open to suggestions.

Thanks for reporting, and sorry for the inconvenience it has caused.

Answer 2 · 2013-10-14T01:23:02.000Z

Maybe try the approach that bottle.py use?

Answer 3 · 2013-10-14T13:32:59.000Z

TLDR; The referenced commits on bug/issue7_unicode_py2 should solve the problem. They need a little more testing, and maybe a little more fixing/reworking.

Before merging I'd like a little more time come up with a nicer solution that maintains backward compatibility, or at least a couple of days to consider ways to clean it up.

Should you care about the details - and you probably don't/shouldn't - just ask.

Long story...

There are actually a lot of silly side effects in just converting the types, most of which would be alleviated by breaking backwards compatibility. I'm not willing to do that so I've introduced some more unwanted complexity.

We're doing more than just convert to Unicode on the way in, because that has an effect on the output type obviously. It would be nice not to care about it, but any downstream users on Python 2 would suddenly find they have Unicode types when they previously had str.

>>> type("hello %s" % 'world')
<type 'str'>
>>> type("hello %s" % u'world')
<type 'unicode'>

There is also the issue of setting up the Unicode strings, the u'' literal syntax wasn't available in Python 3.0-3.2 so it is simpler to let unicodedata.lookup() handle that for us.

Answer 4 · 2013-10-15T14:26:41.000Z

I've pushed the changes out.

Thanks again for the report! Feel free to open more, although I hope you won't need to ;)

Answer 5 · 2013-10-15T14:27:55.000Z

Thanks for your quick response and fix :)

2013/10/15 James Rowe notifications@github.com

I've pushed the changes out.

Thanks again for the report! Feel free to open more, although I hope you
won't need to ;)

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/7#issuecomment-26338659
.