rupa/z

Database is often lost

Closed this issue · 21 comments

I've been suffering from this for a while now, and today decided that I should report. Basically, the issue is described here:

One thing that really annoys me though is that z looses its history when I press Enter rapidly multiple times (I sometimes clear my window this way), before zsh is able to prepare another prompt. This is 100% reproducible on my home machine, but not on my work machine. I don't know whether the culprit is z, fzf wrapper or zsh async, I'm just mentioning it here as zsh async may be related (I'm not sure).

And the following comment:

I've experienced the same issue with z. It's definitely a bug/race-condition in z. My theory is that one instance of z manages to read the database at some point where it's being replaced by another instance, resulting in an empty db.

For this reason I made sure that zsh-async does not run any hooks inside the worker (async.zsh#L75-L80).

I believe this issue was aggravated by either 4a8b741 or 1a361f6 or both.

@rupa Could you please look into this?

Just for reference and history, related old issue: #191

I did a quick hack for zsh to use advisory file locking, should prevent any database corruption. Furthermore, using $RANDOM for the temporary file prefix is not sufficient, switched to mktemp here to avoid collisions. It's entirely possible that the issue described here is simply due to the use of $RANDOM as well.

@balta2ar feel free to try it out if you're interested:

EDIT: Not sure this works as intended, will look closer tomorrow.

diff --git a/z.sh b/z.sh
diff --git a/z.sh b/z.sh
index 24e1272..e7cadbb 100644
--- a/z.sh
+++ b/z.sh
@@ -23,6 +23,8 @@
 #     * z -l foo  # list matches instead of cd
 #     * z -c foo  # restrict matches to subdirs of $PWD
 
+zmodload zsh/system
+
 [ -d "${_Z_DATA:-$HOME/.z}" ] && {
     echo "ERROR: z.sh's datafile (${_Z_DATA:-$HOME/.z}) is a directory."
 }
@@ -56,7 +58,9 @@ _z() {
         done
 
         # maintain the data file
-        local tempfile="$datafile.$RANDOM"
+        local tempfile="$(mktemp "${datafile}.XXXXXXXX")"
+        local lockfd
+        zsystem flock -t 2 -f lockfd "$datafile" || return
         awk < <(_z_dirs 2>/dev/null) -v path="$*" -v now="$(date +%s)" -F"|" '
             BEGIN {
                 rank[path] = 1
@@ -80,13 +84,9 @@ _z() {
                 } else for( x in rank ) print x "|" rank[x] "|" time[x]
             }
         ' 2>/dev/null >| "$tempfile"
-        # do our best to avoid clobbering the datafile in a race condition.
-        if [ $? -ne 0 -a -f "$datafile" ]; then
-            env rm -f "$tempfile"
-        else
-            [ "$_Z_OWNER" ] && chown $_Z_OWNER:$(id -ng $_Z_OWNER) "$tempfile"
-            env mv -f "$tempfile" "$datafile" || env rm -f "$tempfile"
-        fi
+        [ "$_Z_OWNER" ] && chown $_Z_OWNER:$(id -ng $_Z_OWNER) "$tempfile"
+        env mv -f "$tempfile" "$datafile" || env rm -f "$tempfile"
+        zsystem flock -u $lockfd
 
     # tab completion
     elif [ "$1" = "--complete" -a -s "$datafile" ]; then

EDIT: Not sure this works as intended, will look closer tomorrow

Sure, I'm waiting for your signal.

@balta2ar here's an updated patch. My uncertainty from before was with regards to using flock and then replacing the file with another (via mv). I confirmed that a move causes any pending process waiting to lock to fail. In this version we never replace the db and as such flock will work as expected.

diff --git a/z.sh b/z.sh
diff --git a/z.sh b/z.sh
index 24e1272..da4f411 100644
--- a/z.sh
+++ b/z.sh
@@ -23,6 +23,8 @@
 #     * z -l foo  # list matches instead of cd
 #     * z -c foo  # restrict matches to subdirs of $PWD
 
+zmodload zsh/system
+
 [ -d "${_Z_DATA:-$HOME/.z}" ] && {
     echo "ERROR: z.sh's datafile (${_Z_DATA:-$HOME/.z}) is a directory."
 }
@@ -55,8 +57,14 @@ _z() {
             case "$*" in "$exclude*") return;; esac
         done
 
+        # make sure datafile exists (for locking)
+        [[ -f "$datafile" ]] || touch "$datafile"
+
         # maintain the data file
-        local tempfile="$datafile.$RANDOM"
+        local tempfile="$(mktemp "${datafile}.XXXXXXXX")"
+        local lockfd
+        # lock datafile, released when function exits
+        zsystem flock -f lockfd "$datafile" || return
         awk < <(_z_dirs 2>/dev/null) -v path="$*" -v now="$(date +%s)" -F"|" '
             BEGIN {
                 rank[path] = 1
@@ -80,13 +88,10 @@ _z() {
                 } else for( x in rank ) print x "|" rank[x] "|" time[x]
             }
         ' 2>/dev/null >| "$tempfile"
-        # do our best to avoid clobbering the datafile in a race condition.
-        if [ $? -ne 0 -a -f "$datafile" ]; then
-            env rm -f "$tempfile"
-        else
-            [ "$_Z_OWNER" ] && chown $_Z_OWNER:$(id -ng $_Z_OWNER) "$tempfile"
-            env mv -f "$tempfile" "$datafile" || env rm -f "$tempfile"
-        fi
+        # replace contents of datafile with tempfile
+        env cat "$tempfile" >| "$datafile"
+        [ "$_Z_OWNER" ] && chown $_Z_OWNER:$(id -ng $_Z_OWNER) "$datafile"
+        env rm -f "$tempfile"
 
     # tab completion
     elif [ "$1" = "--complete" -a -s "$datafile" ]; then

@mafredri Thank you very much! I've applied the patch and did a short experiment trying to overwrite the DB. So far it seems to be acting good. But I'll keep an eye on it for while to be more sure.

I wonder, do you think flock command could be used here so that it's not zsh-specific and could be used in bash as well?

@balta2ar that command does not exist on macOS, for example, so it wouldn't really be cross-platform :(.

@mafredri Could this be an option? Or this. It's not looking pretty though...

It kind of beats the purpose of keeping z as a shell script if it relies on additional software being installed :/. This can of course be cleaned up to still work with bash as well, but unfortunately, bash would not enjoy the benefits of locking the DB.

bash would not enjoy the benefits of locking the DB

Sorry, I'm not following... Why?

if it relies on additional software

Yeah, you're right. But on the second hand, feature detection could be performed and it would still work without flock. It would work more reliably with it, though.

Anyway, I'm just thinking of a way to have it merged. Now, if I get it right, this won't work in bash as there is no zmodload in bash, obviously. Right, @mafredri?

Sorry, I'm not following... Why?

As in, we would only use zsystem flock in zsh. When I talked about cleaning it up, I did not mean adding support for the flock-command on systems that have it available. In this scenario, bash would have no way to lock.

But on the second hand, feature detection could be performed and it would still work without flock. It would work more reliably with it, though.

Would be a possibility, only beneficial for bash though, since zsh can use zsystem flock.

[...]. Now, if I get it right, this won't work in bash [...]

Yes, correct, would not work in bash as it's a zsh only feature.

Yes, correct, would not work in bash as it's a zsh only feature

Thanks for clarifications! What I meant is that at the moment this patch renders z unusable in bash if applied as is, while it could be structured to detect whether we're running in zsh or bash and use zmodload only in zsh.

Anyway, looks like there is a working solution and I'm ok with patching z manually. Let's see how @rupa decides to integrate it.

Ping? This makes z unusable for me.

@dfaure-kdab it doesn't seem like rupa is active at the moment, but feel free to use #199 if you're using zsh.

rupa commented

i'm aware of and thinking about this.

I'm definitely not in any hurry to add dependencies, although I'm considering bringin back mktemp if that would help.

It's tough for me to evaluate zsh stuff because: it's so configurable, so many people use it with an opinionated framework, and I don't use it.

Anyone that can post a nice setup that reliably clobbers the db, that would be cool

@rupa I can take a stab at creating a repro for you.

PS. My PR does introduce the mktemp dependency, but no other. Uses feature detection to use flock if available (available in default configurations on systems that support flock). I can't think of many other methods that aren't racey in one form or another.

The culprit of this is the switch to using a background process inside a subshell, coupled with the fact that _z_precmd is called inside the precmd hook, instead of the chpwd one (proposed fix).

I've layed down the basics here: ohmyzsh/ohmyzsh#7094 (comment), but the basic gist is that in zsh, $RANDOM will give the same value on subshells, unless it is referenced or seeded in the parent process (see its documentation). This remarkably doesn't happen in bash.

You can reproduce this issue from a directory other than $HOME, with the following (save your .z file before attempting):

repeat 100 { (_z --add "${PWD:a}" &) }

It should give many "no such file or directory" errors, while this one:

repeat 100 { (_z --add "${PWD:a}" &); : $RANDOM }

shouldn't. Notice that I'm referencing $RANDOM after each subshell creation, thus changing its value in each subshell. We don't need mktemp or other dependencies to ensure a unique value then.

The downside of that is that $RANDOM in zsh is not entirely reliable. I made a basic test to check the uniqueness of a random-number sequence using $RANDOM and after 120+ generated numbers the sequence starts to have repeated numbers. Not to worry though, because that means that there'd have to be more than 120 concurrent _z executions for it cause trouble. Even less worrying given that I haven't been able to reproduce any collisions even after 1000 simultaneous runs.

That said, using a precmd hook means that each time the prompt is drawn _z will be called, with the associated cost in computation and IO that aren't doing anything useful.

So here's my proposed fix: use the chpwd hook instead of the precmd one, which only runs after changing to another directory. This is the easiest, most direct fix, and it also makes sense. It will also have the effect that holding down the enter key will not spam _z with added work.

Finally, there's an edge case even running it in a chwpd hook, which is cd-ing into two directories at once:

mkdir ~/{01,02}
cd ~/01; cd ~/02

This will again use the same $RANDOM number. This will probably never happen, as nobody does this, but adding the reference to $RANDOM at the end of the function will fix it for any use case nonetheless.

TL;DR Proposed change:

_z_chpwd() {
	(_z --add "${PWD:a}" &)
	: $RANDOM
}
chpwd_functions+=(_z_chpwd)

I hope that wasn't too verbose. Cheers!

Ok, so now that #215 is closed because using chpwd is not suitable, the proposed fix is just referencing $RANDOM after each subshell invocation:

_z_precmd() {
	(_z --add "${PWD:a}" &)
	: $RANDOM
}

Let me know if you want me to submit a PR.

@mcornella Please do, I'd like to give it a try.

I've submitted #247

dmd commented

Rupa: This would be worth looking into and dealing with, as use of zsh is going to go way up as Mac people move to Catalina, which is zsh by default (and actively deprecates bash).

I experience that z will clear all visited paths from time to time. I don't know how to reproduce this, have happened three times the last three months.

Using z on Git Bash on Windows 10.