vgel/summarize.py

UnicodeEncodeError in Python 2.7.3

Closed this issue · 2 comments

Testing with the article:

http://www.reuters.com/article/2014/08/29/us-syria-crisis-obama-strategy-idUSKBN0GS2KT20140829

produced

File "summarize.py", line 30, in u
    return codecs.unicode_escape_decode(s)[0]
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 88: ordinal not in range(128)

Current work around using:

return s.encode('ascii', 'replace')

It seems to crash in the following paragraph

Representative Tom Price, a Georgia Republican, said on Twitter: "President says "we don’t have a strategy yet" to deal with #ISIS.
Kentucky Senator Mitch McConnell, the top Republican in the Senate, said he thought Obama would have “significant congressional support” if he provided a strategic plan to protect the United States and its allies from the Sunni militants.
vgel commented

Hmm, I'm looking into this. I'm not sure why that codecs call is trying to create an ascii string at all, the point is that it returns a unicode string.

vgel commented

Sorry for the long lead time, but I just commited a fix for this.