drslump/Protobuf-PHP

UnicodeDecodeError: 'utf8' codec can't decode

manti-by opened this issue · 2 comments

Hi. I trying to test my code with your library by Google AdX requester (https://developers.google.com/ad-exchange/rtb/downloads) written on python.
This tool send me binary requests, I decode them, set result to object (which created by your library) and send out serialized response, but got error bellow.
I'm not sure that the problem in your library, but I have no ideas what happend. If you have any ideas, please let me know.

Traceback (most recent call last): File "requester.py", line 337, in <module> main() File "requester.py", line 333, in main PrintSummary(logger_obj, opts.sample_encrypted_price) File "requester.py", line 198, in PrintSummary summarizer.Summarize() File "/home/op/requester/log.py", line 277, in Summarize bid_response.ParseFromString(record.payload) File "/usr/lib/pymodules/python2.7/google/protobuf/message.py", line 168, in ParseFromString self.MergeFromString(serialized) File "/usr/lib/pymodules/python2.7/google/protobuf/reflection.py", line 821, in MergeFromString if self._InternalParse(serialized, 0, length) != length: File "/usr/lib/pymodules/python2.7/google/protobuf/reflection.py", line 848, in InternalParse pos = field_decoder(buffer, new_pos, end, self, field_dict) File "/usr/lib/pymodules/python2.7/google/protobuf/internal/decoder.py", line 450, in DecodeRepeatedField if value.add()._InternalParse(buffer, pos, new_pos) != new_pos: File "/usr/lib/pymodules/python2.7/google/protobuf/reflection.py", line 848, in InternalParse pos = field_decoder(buffer, new_pos, end, self, field_dict) File "/usr/lib/pymodules/python2.7/google/protobuf/internal/decoder.py", line 337, in DecodeField field_dict[key] = local_unicode(buffer[pos:new_pos], 'utf-8') UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 126: invalid continuation byte

Protocol Buffers serialize fields of type string in UTF8. Since PHP is somewhat lacking on the handling of different string encodings the library applies no treatment to them. If you're setting any string field make sure you're feeding them utf8 encoded strings. If your data is in iso-8859-1 you can do something like this:

$resp->str = utf8_encode('this is my string')

Thanks, it works!