arjantop/rust-bencode

Reading a string which contains invalid UTF8

aochagavia opened this issue · 11 comments

In a .torrent file, the hashes of the pieces are saved as a string. However, this string does not match Rust's String type, because it contains incorrect UTF8. What we actually want is to save the hashes in a Vec<u8>, but then the library will try to parse a list and produce an error.

Is there a work-around for this?

The only way to do this at the moment is to use custom FromBencode implementation

I think it would be a good idea to create a BencodeString type, which contains a Vec<u8> without the guarantee of UTF8 correctness (similar to the Key struct). You could provide some additional methods such as to_string, as_bytes, etc. This seems to be the best way to solve this issue.

Should I make some experiments and submit a PR? If you prefer to do it yourself it is also ok!

That was the plan for Key (should be renamed to be more general) but only custom encoding is implemented, there is a problem getting the required data from a decoder (can't make every string unchecked).

But I have an idea: I can add another DecoderResult error named StringEncoding that would contain the original &[u8] and custom decode implementation can read the string, check the error if it is StringEncoding and return that.

Is this because the Decoder trait only provides a read_str method?

@aochagavia let me know if there are any issues with current implementation.

But you still can't use deriving to get the implementation.

I will try to come up with an idea to solve this...

It looks like there is no way to do it...

I have opened an issue (rust-lang/rust#15683) in rustc to extend the Encoder and Decoder traits.

FWIW I pattern-match the vector out of the ByteString, see here. It's not ideal (and now broken because of the changes) but it works.

What does not work exactly? you just have to use util::ByteString namespaced or import it under a different name.

I meant that the changes broke the commit that the link is pointing to, not that it's wrong in any way; quite the opposite actually, I much prefer the new way.