emka/openstreetbugs

Avoid character encoding issue in addPOIexec

nattomi opened this issue · 4 comments

I had a character encoding problem with the name obtained from the Geonames service. The only way I could overcome this is to replace line
values["nearbyplace"] = "%s [%s]" % (name, country)
in addPOIexec with
values["nearbyplace"] = "%s [%s]" % (name.encode("utf-8"), country.encode("utf-8"))
I would recommend to make this modification -- or something equivalent -- otherwise bug reporting at locations with f.i. "ő" in their name won't work. Other characters can be problematic too, but that was the one I met with.

similarly, in getRSSfeed,
print "<title>%s (near %s)</title>%s%s?lat=%s&lon=%s&zoom=18%srssitem?id=%s%sgeo:lat%s/geo:latgeo:long%s/geo:long" % (type, c[6], desc, server_uri, c[2], c[1], api_uri, c[0], pubDate, c[2], c[1])
should be replaced with
print "<title>%s (near %s)</title>%s%s?lat=%s&lon=%s&zoom=18%srssitem?id=%s%sgeo:lat%s/geo:latgeo:long%s/geo:long" % (type, c[6].encode("utf-8"), desc, server_uri, c[2], c[1], api_uri, c[0], pubDate, c[2], c[1])

emka commented

Thanks!

Please review this change, as this apparently makes things even worse. See issue #29

mibe commented

GeoNames is returning the place names in UTF-8 already. Why do they need to get encoded again?

It looks like the country is returned as ISO 3166-1 alpha-2 code. These two-letter codes range from U+0041 to U+005A (A-Z). It's not absolutely neccessary to encode them as UTF-8 here, because UTF-8 is backwards compatible to US-ASCII & ISO 8859-1 (latin1_* in MySQL) from U+0000 to U+007F.