bpot/poseidon

encoding compatibility errors

sclasen opened this issue · 7 comments

When writing UTF-8 chars in a kafka message we are getting

Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

 app error: Error writting messages_for_topics in Poseidon::Protocol::ProduceRequest (Poseidon::Protocol::ProtocolStruct::EncodingError: Error writting messages_for_partitions in Poseidon::Protocol::MessagesForTopic (Poseidon::Protocol::ProtocolStruct::EncodingError: Error writting message_set in Poseidon::Protocol::MessagesForPartition (Poseidon::Protocol::ProtocolStruct::EncodingError: Error writting messages in Poseidon::Protocol::MessageSetStructWithSize (Poseidon::Protocol::ProtocolStruct::EncodingError: Error writting message in Poseidon::Protocol::MessageWithOffsetStruct (Poseidon::Protocol::ProtocolStruct::EncodingError: Error writting value in Poseidon::Protocol::MessageStruct (Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8)))))) (Poseidon::Protocol::ProtocolStruct::EncodingError)

Probably due to

https://github.com/bpot/poseidon/blob/master/lib/poseidon/protocol/request_buffer.rb#L10

Can/Should that encoding be made configurable?

bpot commented

That string needs to be ASCII-8BITbecause it will hold binary data. There is some weird encoding (possibly bug?) behavior I came across when trying to reproduce this. If you try to append an invalid UTF-8 string to an ASCII-8BIT string it will work but the resulting string will be UTF8!:

irb(main):001:0> s = ''.encode("ASCII-8BIT")
=> ""
irb(main):002:0> n = "hello\xffasdf"
=> "hello\xFFasdf"
irb(main):003:0> n.encoding
=> #<Encoding:UTF-8>
irb(main):004:0> n.valid_encoding?
=> false
irb(main):005:0> s.encoding
=> #<Encoding:ASCII-8BIT>
irb(main):006:0> s << n
=> "hello\xFFasdf"
irb(main):007:0> s.encoding
=> #<Encoding:UTF-8>

To work around this I'm going to force all incoming strings to be ASCII-8BIT.

bpot commented

@sclasen can you try with the latest master and see if that fixes your issue?

Thanks for reporting this!

@bpot Dang that blows up in some cases with

class=Poseidon::Protocol::ProtocolStruct::EncodingError message="Error writting common in Poseidon::Protocol::MetadataRequest (Poseidon::Protocol::ProtocolStruct::EncodingError: Error writting client_id in Poseidon::Protocol::RequestCommon (RuntimeError: can't modify frozen String))
protocol_struct.rb:97:in `rescue in block (3 levels) in write

bpot commented

Okay, can you try again? It should handle frozen strings now.

@bpot whoa I think this is our issue here.

The client_id we are using is coming from a frozen string, we'll just call .dup on it before creating the producer.

@bpot heh, thanks!

@bpot Works!