apache/dubbo-go-hessian2

String encoding and decoding

lujjjh opened this issue · 6 comments

There are 2 known issues in string encoding and decoding.

The first one is an out-of-range problem that the code below doesn't handle edge cases well:

bufp := gxbytes.AcquireBytes(CHUNK_SIZE * 3)

The inner loop

for charCount < CHUNK_SIZE {

could actually exit with charCount > CHUNK_SIZE (or more precisely, charCount == CHUNK_SIZE + 1). The maximum bytes taken could be (CHUNK_SIZE + 1) * 3.

A simple reproducible test case:

func TestEncStringChunk(t *testing.T) {
	enc := NewEncoder()
	v := strings.Repeat("我", CHUNK_SIZE-1) + "🤣"
	assert.Nil(t, enc.Encode(v))
	dec := NewDecoder(enc.Buffer())
	s, err := dec.Decode()
	assert.Nil(t, err)
	assert.Equal(t, v, s)
}

After a quick fix with

bufp := gxbytes.AcquireBytes((CHUNK_SIZE + 1) * 3)

I encountered the second issue with the same test case above:

    	Error:      	Not equal: 
    	            	expected: "我我我……我🤣"
   	            	actual  : "我我我……我🤣\x00\x00"

After bisection, I assume this was introduced in dea1174 because the same test could be passed if I apply the quick fix on 8dcaa20, which is the parent of dea1174.

I haven't dived into the commit yet since it's a bit complicated.

@wongoo we have met such problems in my memory. It is not so easy to fix this problem.

@zonghaishang pls check this issue, I will also go into sometime later

ref: #252

the current chunk string decoding algorithm is complex, and hard to maintain. I will try to refactor it.

#254 does not actually fix this case. I've created a pull request.