Failed to jump to path containing CJK characters
OceanS2000 opened this issue · 5 comments
When there are entries containing CJK characters in the database, z -l
can list them just fine but jumping or z -e
fails. It seems to me that multi-byte characters are corrupted somehow.
ocean@Satori ~ % export ZSHZ_DATA=/tmp/nothing
ocean@Satori ~ % pushd TW4791主包内容
ocean@Satori TW4791主包内容 % popd
ocean@Satori ~ % z -l TW
29985 ~/TW4791主包内容
ocean@Satori ~ % z -e TW
~/TW4791主僣
ocean@Satori ~ % z TW
ocean@Satori ~ % # Note jumping failed silently
ocean@Satori ~ % cat /tmp/nothing
/home/ocean/TW4791主包内容|1|1627470210
ocean@Satori ~ % # The database itself seems fine
OS: Gentoo Linux
zsh version: zsh 5.8 (x86_64-pc-linux-gnu)
zsh-z version: a7fa3bb (current master branch)
I found that tab completion can workaround this.
After some research I noticed MULTIBYTE
option but the issue persists no matter whether MULTIBYTE
is set.
Moreover, the output seems not to be just byte truncated:
ocean@Satori ~ % echo '~/TW4791主包内容' | xxd
00000000: 7e2f 5457 3437 3931 e4b8 bbe5 8c85 e586 ~/TW4791........
00000010: 85e5 aeb9 0a .....
ocean@Satori ~ % z -e TW | xxd
00000000: 7e2f 5457 3437 3931 e4b8 bbe5 83a3 ac83 ~/TW4791........
00000010: a3a5 e583 0a .....
Thanks for describing the problem so well.
I think I found the source of the problem, but it would help to have you test things out. Could you switch to the CJK
branch of the repository and try the code I have there?
Thanks.
Thanks for your quick response! The CJK
branch fixes the issue for me.
I'll close the issue for now and if more testing/information is needed feel free to @ mention me.
If it's all right, I'm going to keep this issue open for a little while, simply because I'm curious as to why the original code doesn't work. You don't have to do anything, although if I do write some new code, I would love it if you'd be willing to try it out. Thanks so much for drawing my attention to the problem.
Here are my observations:
In theory, in ZSH v5.3.0 and later you can print
something and store it in a variable (e.g. $REPLY
) by doing
print -v REPLY 'TW4791主包内容'
print -v
doesn't exist before ZSH v5.3.0, so in that case I print -z
the string to the editing buffer stack and then read
it into $REPLY
:
print -z 'TW4791主包内容'
read -rz REPLY
What is strange is that the esoteric method of using print -z
/read -z
seems to handle multibyte characters well, but the presumably more straightforward print -v
technique fails. I am curious to find out why.
We seem to have come across a known bug with print -v
. It has only been fixed recently. I would like to be able to use print -v
, but in order to support the many existing versions of ZSH with the bug we will have to use print -v ... -f
, i.e., printf
, a related function which never had the bug.
I have applied a fix to the CJK
branch. @OceanS2000, would you be willing to test it for me? Thank you!