UTF-8 Strings are unparseable?
Closed this issue · 2 comments
Marcus-Rosti commented
Uh, I'm not entirely sure where to open this, but it seems that UTF-8 encoded strings are often unparseable?
def thereAndBackAgain(g: MyProto): Try[MyProto] = {
val str = new String(g.toByteArray, StandardCharsets.UTF_8)
Try(MyProto.parseFrom(str.getBytes(StandardCharsets.UTF_8)))
}
will fail anytime a negative number appears in the proto or for some obscure strings. I reproed this via scalacheck -- I can probably spin off a minimal project to demonstrate this but it was fairly easy to produce that outcome.
thesamet commented
Hi Marcus, Not every array of bytes can be viewed as UTF-8 encoded string.
The output of toByteArray isn’t expected to be valid UTF-8. You can test
whether you are passing to parseFrom are the same bytes you are getting
from toByteArray and if not the failure is expected…
…-Nadav
On Mon, Apr 8, 2024 at 9:58 PM Marcus Rosti ***@***.***> wrote:
Uh, I'm not entirely sure where to open this, but it seems that UTF-8
encoded strings are often unparseable?
def thereAndBackAgain(g: MyProto): Try[MyProto] = {
val str = new String(g.toByteArray, StandardCharsets.UTF_8)
Try(MyProto.parseFrom(str.getBytes(StandardCharsets.UTF_8)))
}
will fail anytime a negative number appears in the proto or for some
obscure strings. I reproed this via scalacheck -- I can probably spin off a
minimal project to demonstrate this but it was fairly easy to produce that
outcome.
—
Reply to this email directly, view it on GitHub
<#1676>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACLBLIF42OAYUSITFGC6ZLY4NYQXAVCNFSM6AAAAABF53OFFWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTENRQGMZTKMI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
thesamet commented
Closing since this is not an issue with ScalaPB. Let me know if you have any additional questions.