Efficiently convert bytes to strings
jCalamari opened this issue · 6 comments
In ScalaPB, what's the best way of converting message A to message B, knowing that message A can contain thousands of elements? Ideally, I would like to just pass the bytes from A to B without copying/looping.
message A {
repeated bytes a_strings = 1;
}
message B {
repeated string b_strings = 1;
}
Naive approach would look as follows, however there is just too much copying/looping:
val a: A = ...
val b: B = B(a.a_strings.map(ByteString.copyFromUtf8))
Context would be more helpful to answer this since you haven't stated where A
comes from and how you want to use the B
s. A few thoughts:
A
andB
have the same binary representation. So if you haveA
available in binary forms, just parse it usingB.parseFrom
and then you get efficiency by not instantiatingA
s.- Don't create messages of type
B
, convert the bytes to string at the time of access. You could add a base trait toA
that has a method likedef getString(index: Int): String = ByteString.copyFromUtf8(a_strings(i))
Hi @thesamet,
thanks for prompt response! A
message comes from API response and there is no way to avoid creating A
message. Sadly A
and B
don't have the same binary representation (I formed the example wrong), they have many more fields and their field IDs don't match. Since I am working with akka-grpc ecosystem, delaying creation of B
is not an option. Is there a way to create instance of B
just by providing bytes for b_strings
field?
Do you own the proto for B
? If so, you can leave the proto type as bytes, but similar to (2) above add a base trait that converts to strings only what you need. Then, the creation of this field is just assigning the same reference of Seq[ByteString]
.
Do you own the proto for
B
? If so, you can leave the proto type as bytes, but similar to (2) above add a base trait that converts to strings only what you need. Then, the creation of this field is just assigning the same reference ofSeq[ByteString]
.
This sounds promising. What would happen to already compiled clients who would expect repeated string
but got repeated bytes
?
The binary representation is the same. Running clients will not be impacted when this change rolls out to a server.
The binary representation is the same. Running clients will not be impacted when this change rolls out to a server.
Worked like a charm, thank you very much for your prompt responses!