String encoding and decoding
lujjjh opened this issue · 6 comments
There are 2 known issues in string encoding and decoding.
The first one is an out-of-range problem that the code below doesn't handle edge cases well:
Line 164 in e2494da
The inner loop
Line 176 in e2494da
could actually exit with charCount > CHUNK_SIZE
(or more precisely, charCount == CHUNK_SIZE + 1
). The maximum bytes taken could be (CHUNK_SIZE + 1) * 3
.
A simple reproducible test case:
func TestEncStringChunk(t *testing.T) {
enc := NewEncoder()
v := strings.Repeat("我", CHUNK_SIZE-1) + "🤣"
assert.Nil(t, enc.Encode(v))
dec := NewDecoder(enc.Buffer())
s, err := dec.Decode()
assert.Nil(t, err)
assert.Equal(t, v, s)
}
After a quick fix with
bufp := gxbytes.AcquireBytes((CHUNK_SIZE + 1) * 3)
I encountered the second issue with the same test case above:
Error: Not equal: expected: "我我我……我🤣" actual : "我我我……我🤣\x00\x00"
After bisection, I assume this was introduced in dea1174 because the same test could be passed if I apply the quick fix on 8dcaa20, which is the parent of dea1174.
I haven't dived into the commit yet since it's a bit complicated.
@wongoo we have met such problems in my memory. It is not so easy to fix this problem.
@zonghaishang pls check this issue, I will also go into sometime later
the current chunk string decoding algorithm is complex, and hard to maintain. I will try to refactor it.