zimfw/completion

locale change causes .zcomdump* being deleted

cattyhouse opened this issue · 12 comments

if the locale changes, e.g. from en_US.UTF-8 to C.UTF-8, e.g. ssh from archlinux and then ssh from alpine to the same machine, then

[[ ${zold_dat} == ${znew_dat} ]]; zdump_dat=${?}
will return 1, and the .zcomdump* got deleted, and then
>! ${zdumpfile}.dat <<<${znew_dat}
got run, which will be slow.

after

sysread -s ${#znew_dat} zold_dat <${zdumpfile}.dat

the debug (set -x) shows

# log for [[ ${zold_dat} == ${znew_dat} ]]

[[ $'5.9\C-@/home/user/.zim/modules/zsh-completions/src/_afew\C-@... == 5.9/home/user/.zim/modules/zsh-completions/src/_afew... ]]

# the log is huge, so i use ... to replace the rest of all

as you can see the difference is \C-@ in there...

note: why locale changes sometimes via ssh? because 1) ssh is set to send locale and accept locale 2) alpine uses musl which uses C.UTF-8 and other OSes uses something else other than C.UTF-8

steps to reproduce:

let's say machine A:

  • has zimfw installed
  • has sshd set to AcceptEnv LANG LC_*
  • has enabled en_US.UTF-8 C.UTF-8 in /etc/locale.gen, and locale-gen is run

above is very normal configuration nowerdays.

now, we ssh from another machine to A:

  1. LANG="en_US.UTF-8" ssh A
  2. ls --full-time .zcompdump*
  3. exit
  4. LANG="C.UTF-8" ssh A
  5. ls --full-time .zcompdump*

the two ls --full-time .zcompdump* will show different timestamps

ericbn commented

Hi. Thanks for reporting this.

I've tried exporting LANG to different values in my machine and in an archlinux docker container and the .dat file didn't change. Can you try applying the patch below?

diff --git a/init.zsh b/init.zsh
index d7eb682..3ae58ae 100644
--- a/init.zsh
+++ b/init.zsh
@@ -20,15 +20,15 @@
   local -r znew_dat=${ZSH_VERSION}$'\0'${(pj:\0:)zcomps}$'\0'${(pj:\0:)zstats}
   if [[ -e ${zdumpfile}.dat ]]; then
     zmodload -F zsh/system b:sysread
-    sysread -s ${#znew_dat} zold_dat <${zdumpfile}.dat
     [[ ${zold_dat} == ${znew_dat} ]]; zdump_dat=${?}
+    LC_CTYPE=C sysread -s ${#znew_dat} zold_dat <${zdumpfile}.dat
   fi
   if (( zdump_dat )) command rm -f ${zdumpfile}(|.dat|.zwc(|.old))(N)

   autoload -Uz compinit && compinit -C -d ${zdumpfile}

   if [[ ! ${zdumpfile}.dat -nt ${zdumpfile} ]]; then
-    >! ${zdumpfile}.dat <<<${znew_dat}
+    zmodload -F zsh/system b:syswrite
+    LC_CTYPE=C syswrite ${znew_dat} >! ${zdumpfile}.dat
   fi
   # Compile the completion dumpfile; significant speedup
   if [[ ! ${zdumpfile}.zwc -nt ${zdumpfile} ]] zcompile ${zdumpfile}

Not even sure these commands would recognize the LC_CTYPE=C prefix. :- )

EDIT: Using syswrite to write the .dat file.

the patch will cause .zcompdump* to be regenerated every ssh, slow on every ssh initiation

I've tried exporting LANG to different values in my machine and in an archlinux docker container and the .dat file didn't change

--> from en_US.UTF-8 to en_GB.UTF-8, it is fine, but to C.UTF-8 or zh_CN.UTF-8, it is NOT ok. you did not notice the change probably because your /etc/locale.gen did not enable those locales mentioned ( for me, ssh into alpine never has such issue, because alpine does not have any locale settings, it is always C.UTF8, no matter what ssh client's locale is). to reprocude, the condition needs to be met:

  1. client's ssh_config has SendEnv LANG LC_*
  2. machine A's sshd_config has AcceptEnv LANG LC_*
  3. the locales mentioned e.g. C.UTF-8 en_US.UTF-8 are enabled in machine A's /etc/locale.gen and command locale-gen is run.

and then go to #13 (comment)

ericbn commented

I've updated the patch above. Writing was not working. Can you please try again with the updated patch?

patching file init.zsh
patch: **** malformed patch at line 23: if [[ ! ${zdumpfile}.zwc -nt ${zdumpfile} ]] zcompile ${zdumpfile}

anyway, i hand edited the file according to your new patch, it is the same : the patch will cause .zcompdump* to be regenerated every ssh, slow on every ssh initiation

so. i did something experimental to machine A (without your patch):

  1. edit /etc/ssh/sshd_config, comment out AcceptEnv LANG LC_* (disable it), this will cause sshd to stick with it's locale instead of adapting from client.
  2. restart sshd daemon

now, no matter how i ssh into A from whatever places, and whatever client , the ls --full-time .zcompdump* timestamps will not change, and the ssh is fast.

so i am pretty sure this issue is cause by locale, and the key question is, where is this \C-@ from when the locale changes?

# log for [[ ${zold_dat} == ${znew_dat} ]]

[[ $'5.9\C-@/home/user/.zim/modules/zsh-completions/src/_afew\C-@... == 5.9/home/user/.zim/modules/zsh-completions/src/_afew... ]]

# the log is huge, so i use ... to replace the rest of all

i guess sysread or syswrite does not accept ENV variable. i did another experiment :

# Check if dumpfile is up-to-date by comparing the full path and
  # last modification time of all the completion functions in fpath.
  local _LANG=$LANG # store current LANG
  LANG=C # force LANG to C
.....

# Compile the completion dumpfile; significant speedup
  if [[ ! ${zdumpfile}.zwc -nt ${zdumpfile} ]] zcompile ${zdumpfile}
  LANG=$_LANG # restore LANG

that is to say, store current $LANG to _LANG, and set LANG to C at the beginning of the code, and restore LANG after the code.

this solved the problem.

but i think you can figure out a better solution.

ericbn commented

The \C-@ is equivalent to the $'\0' (null) character:

% xxd .zcompdump.dat
00000000: 352e 3900 2f68 6f6d 652f 7573 6572 2f2e  5.9./home/user/.
00000010: 7a69 6d2f 6d6f 6475 6c65 732f 7a73 682d  zim/modules/zsh-
00000020: 636f 6d70 6c65 7469 6f6e 732f 7372 632f  completions/src/
00000030: 5f61 6665 7700                           _afew.

I still didn't setup a ssh machine to try to reproduce the issue. Great to know you found a workaround and are very close to the solution. I wonder if the issue is when the file is read, when it's written, something else, or both...

i've updated the steps to reproduce: #13 (comment)

so i moved the pair LANG=C and LANG=en_US.UTF-8 around inside the code, and found that the issue is this line:

local -r zcomps=(${^fpath}/^([^_]*|*~|*.zwc)(N))

i've summited a pull request.

Test:

LC_ALL=C zcomps_C=(${^fpath}/^([^_]*|*~|*.zwc)(N))

LC_ALL=en_US.UTF-8 zcomps_US=(${^fpath}/^([^_]*|*~|*.zwc)(N))

[[ $zcomps_C == $zcomps_US ]]

echo $?

returns 1