agkozak/zsh-z

Failed to jump to path containing CJK characters

OceanS2000 opened this issue · 5 comments

When there are entries containing CJK characters in the database, z -l can list them just fine but jumping or z -e fails. It seems to me that multi-byte characters are corrupted somehow.

ocean@Satori ~ % export ZSHZ_DATA=/tmp/nothing
ocean@Satori ~ % pushd TW4791主包内容
ocean@Satori TW4791主包内容 % popd
ocean@Satori ~ % z -l TW
29985      ~/TW4791主包内容
ocean@Satori ~ % z -e TW
~/TW4791主僣
ocean@Satori ~ % z TW
ocean@Satori ~ % # Note jumping failed silently
ocean@Satori ~ % cat /tmp/nothing 
/home/ocean/TW4791主包内容|1|1627470210
ocean@Satori ~ % # The database itself seems fine

OS: Gentoo Linux
zsh version: zsh 5.8 (x86_64-pc-linux-gnu)
zsh-z version: a7fa3bb (current master branch)

I found that tab completion can workaround this.

After some research I noticed MULTIBYTE option but the issue persists no matter whether MULTIBYTE is set.
Moreover, the output seems not to be just byte truncated:

ocean@Satori ~ % echo '~/TW4791主包内容' | xxd
00000000: 7e2f 5457 3437 3931 e4b8 bbe5 8c85 e586  ~/TW4791........
00000010: 85e5 aeb9 0a                             .....
ocean@Satori ~ % z -e TW | xxd
00000000: 7e2f 5457 3437 3931 e4b8 bbe5 83a3 ac83  ~/TW4791........
00000010: a3a5 e583 0a                             .....

Thanks for describing the problem so well.

I think I found the source of the problem, but it would help to have you test things out. Could you switch to the CJK branch of the repository and try the code I have there?

Thanks.

Thanks for your quick response! The CJK branch fixes the issue for me.

I'll close the issue for now and if more testing/information is needed feel free to @ mention me.

If it's all right, I'm going to keep this issue open for a little while, simply because I'm curious as to why the original code doesn't work. You don't have to do anything, although if I do write some new code, I would love it if you'd be willing to try it out. Thanks so much for drawing my attention to the problem.

Here are my observations:

In theory, in ZSH v5.3.0 and later you can print something and store it in a variable (e.g. $REPLY) by doing

print -v REPLY 'TW4791主包内容'

print -v doesn't exist before ZSH v5.3.0, so in that case I print -z the string to the editing buffer stack and then read it into $REPLY:

print -z 'TW4791主包内容'
read -rz REPLY

What is strange is that the esoteric method of using print -z/read -z seems to handle multibyte characters well, but the presumably more straightforward print -v technique fails. I am curious to find out why.

We seem to have come across a known bug with print -v. It has only been fixed recently. I would like to be able to use print -v, but in order to support the many existing versions of ZSH with the bug we will have to use print -v ... -f, i.e., printf, a related function which never had the bug.

I have applied a fix to the CJK branch. @OceanS2000, would you be willing to test it for me? Thank you!