/direnv-cache

Caching for direnv

Primary LanguageShell

Direnv caching

This is an implementation of caching for direnv.

direnv-cache caches the environment into a .env file, and then loads from the .env if it is still valid. This could be helpful when re-entering a directory that has not changed since we last left it; we can restore the environment without fully executing the .envrc.

Contents

Installation

The accompanying file cache.sh should be added into direnv’s library,

  • by dropping into $HOME/.config/direnv/lib
    curl -sSL https://raw.githubusercontent.com/indigoviolet/direnv-cache/main/cache.sh -o $HOME/.config/direnv/lib/05-cache.sh
        
  • or by appending to direnvrc
    curl -sSL https://raw.githubusercontent.com/indigoviolet/direnv-cache/main/cache.sh >> $HOME/.config/direnv/direnvrc
        
  • jq should be available

Usage

Add the use_cache function to the top of your .envrc:

cat envrc_with_cache
use_cache
layout_poetry

If the cache does not exist or is invalid, use_cache will fall through to the rest of the .envrc, and then cache the resulting environment for future use.

Note that direnv reload may no longer do what you expect: it will load from cache if possible. Delete the .env file to force a full reload.

Environment variables

These are mostly for debugging purposes, but may come in handy:

DIRENV_CACHE_IGNORE
if set, the cache will always be ignored.
DIRENV_CACHE_DEBUG
if set, some debugging output will be printed. if >1, more verbose debugging output is available.

Opportunities for improvements

A few additions to the core of direnv could make this better:

  1. Currently we are unable to figure out why the envrc is being evaluated: we can’t distinguish between precmd, direnv reload and chwpd. It would be nice for direnv to hint this, perhaps via an environment variable. Since we only need to test cache validity in the last case, this could improve performance in the first two cases. Our current implementation has to test cache validity in all three cases. (See explanation)
  2. direnv current tests 1 watched file from DIRENV_WATCHES at a time; we could have a higher-level check that tests whether the existing environment is current.
  3. The output of direnv export requires processing to convert to dotenv format. We could support direnv export dotenv.

Benchmarking

See details below. The average improvement isn’t dramatic in my measurements (with layout_poetry), but it depends on how expensive the envrc is.

Motivation

I use direnv for setting up Poetry (layout_poetry) or conda, and in these scenarios, the environment created via direnv doesn’t change all that often, and is amenable to caching.

I also use direnv in Emacs (envrc), and having fast loads would make my editor start up faster.

I could create a .env manually but it’s convenient to define it using direnv’s primitives.

Implementation Notes

Rough model of how direnv works

  1. direnv hook <shell> sets up direnv export json to be called on each prompt (precmd) and directory change (chpwd)
  2. upon chwpd, if we are entering a directory containing an envrc, we execute it in a subshell, compare the env from the subshell to our own, and then apply that diff to our env. (loading).
  3. Note that the env in the subshell that direnv export json is executed in, is carefully restored to the “pre-direnv” state, by reverting DIRENV_DIFF.
  4. Several direnv-specific state tracking env variables are set - ex. DIRENV_FILE (the envrc file), DIRENV_DIR (the directory containing the envrc), DIRENV_WATCHES (name, mtime, existence of all watched files), DIRENV_DIFF (the diff that was applied).
  5. Upon chwpd, if we already have these variables in our environment, and are leaving the DIRENV_DIR, these tracking variables are unset, and the inverse of DIRENV_DIFF is applied. (unloading)
  6. Upon precmd, if DIRENV_WATCHES is stale i.e, the watched files have changed, direnv loads again (direnv current implements this for one file at a time).
  7. direnv watch and friends add to DIRENV_WATCHES, so they act as dependencies for the env state.
direnv hook zsh
_direnv_hook() {
  trap -- '' SIGINT;
  eval "$("/home/linuxbrew/.linuxbrew/Cellar/direnv/2.30.3/bin/direnv" export zsh)";
  trap - SIGINT;
}
typeset -ag precmd_functions;
if [[ -z "${precmd_functions[(r)_direnv_hook]+1}" ]]; then
  precmd_functions=( _direnv_hook ${precmd_functions[@]} )
fi
typeset -ag chpwd_functions;
if [[ -z "${chpwd_functions[(r)_direnv_hook]+1}" ]]; then
  chpwd_functions=( _direnv_hook ${chpwd_functions[@]} )
fi

How the cache works

Caching is only useful when re-entering a directory that hasn’t changed in the interim. In this case, we would like to restore our previous state.

  1. use_cache is the first statement in the envrc, so it can short circuit if loading from cache.

    Here are the scenarios when the envrc is executed:

    (use_cache sees a DIRENV_WATCHES containing only the envrc & allow. files)

    invocation modeDIRENV_WATCHEScache verification needed?cache action
    precmdset, staleno - known to be invalidrebuild
    direnv reloadset, irrelevantno - forced reloadrebuild
    chdir (enter)unset or from a previous RCyes - might be stalerebuild if cache is valid

    Unfortunately, there doesn’t appear to be any way to know which of these invocation modes we are in – since the envrc always executes in a “clean” subshell.

    All we know is that direnv wants to execute the envrc; we can test whether the cache is valid (based on whether the cached DIRENV_WATCHES is stale), and rebuild if it is not, or load from cache if valid.

  2. building the cache: run direnv export json in a clean subshell, and convert that into dotenv format into .env (using jq)
  3. if the cache is valid: load it via dotenv_if_exists, otherwise build it
  4. some extra env switches are provided to help debug things: DIRENV_CACHE_IGNORE, DIRENV_CACHE_DEBUG

deserializing DIRENV_WATCHES

DIRENV_WATCHES is in gzenv format, ie base64-urlencoded + zlib + json

direnv show_dump $DIRENV_WATCHES
echo $DIRENV_WATCHES | python -c "import sys; import zlib; import base64; print(zlib.decompress(base64.urlsafe_b64decode(sys.stdin.read())).decode('utf-8'))" | jq '.'
{ printf "\x1f\x8b\x08\x00\x00\x00\x00\x00" ; echo $DIRENV_WATCHES | basenc --base64url -d ; } | gzip -d | jq '.'

Watching .env

dotenv_if_exists will usually watch_file .env, which modifies DIRENV_WATCHES, but then immediately the DIRENV_WATCHES from the cache will overwrite this, so that .env will not be watched.

Do we even want to watch the cache file? I don’t think so: users shouldn’t be modifying it directly; if deleted, it will get recreated the next time direnv tries to load something.

Note, if we end up wanting to watch .env

Attempting to get the cache file into DIRENV_WATCHES is tricky:

  • DIRENV_WATCHES is captured in the subshell, and won’t contain .env by default. We do need to capture DIRENV_WATCHES, since the .envrc could be registering files to watch.
  • the first problem is mentioned above: dotenv_if_exists will watch_file on the cache file but the resulting DIRENV_WATCHES will be lost when the cache is actually loaded.
  • So we need to watch_file .env after the cache is created and loaded; this generates a new DIRENV_WATCHES containing the current stat of .env. But if we modify .env after this to update the cached value of DIRENV_WATCHES, our cache will appear invalid (since DIRENV_WATCHES is stale), and we will rebuild the cache.
  • The trick could be to first update .env with a DIRENV_WATCHES value that includes itself, and then the env, as below. Here we are appending a second export of DIRENV_WATCHES to .env, which will override the earlier one.
{ direnv watch json .env | jq -r '"export DIRENV_WATCHES=\(.DIRENV_WATCHES|@sh)"' >> .env; eval $(direnv watch zsh .env); }

Benchmarking

Setup

[tool.poetry]
name = "direnv-cache-test"
version = "0.1.0"
description = "Test project for benchmarking direnv-cache."
authors = ["Venky Iyer <indigoviolet@gmail.com>"]

[tool.poetry.dependencies]
python = "^3.8"

[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
use_cache
layout_poetry
layout_poetry
python 3.8.1
brew install hyperfine
cp cache.sh ~/.config/direnv/05-cache.sh
icdiff cache.sh ~/.config/direnv/05-cache.sh

Create directories

WITH_CACHE_DIR=/tmp/with_cache WITHOUT_CACHE_DIR=/tmp/without_cache
rm $WITH_CACHE_DIR $WITHOUT_CACHE_DIR -rf
mkdir $WITH_CACHE_DIR $WITHOUT_CACHE_DIR
ln -sf $(realpath pyproject.toml) $WITH_CACHE_DIR/
ln -sf $(realpath tool-versions) $WITH_CACHE_DIR/
( cd $WITH_CACHE_DIR && poetry install )
ln -sf $(realpath envrc_with_cache) $WITH_CACHE_DIR/.envrc
direnv allow $WITH_CACHE_DIR/.envrc

ln -sf $(realpath pyproject.toml) $WITHOUT_CACHE_DIR/
ln -sf $(realpath tool-versions) $WITHOUT_CACHE_DIR/
( cd $WITHOUT_CACHE_DIR && poetry install )
ln -sf $(realpath envrc_without_cache) $WITHOUT_CACHE_DIR/.envrc
direnv allow $WITHOUT_CACHE_DIR/.envrc
:

Results

ensure cache

export DIRENV_CACHE_DEBUG=1
direnv exec "$WITH_CACHE_DIR" bash -c "ls $WITH_CACHE_DIR/.env -al"
:

Measurements

hyperfine -w 10 -L dir "$WITH_CACHE_DIR","$WITHOUT_CACHE_DIR" 'cd {dir}'
Benchmark 1: cd /tmp/with_cache
  Time (mean ± σ):       0.0 ms ±   0.1 ms    [User: 0.1 ms, System: 0.1 ms]
  Range (min … max):     0.0 ms …   1.5 ms    3353 runs

Benchmark 2: cd /tmp/without_cache
  Time (mean ± σ):       0.1 ms ±   0.1 ms    [User: 0.1 ms, System: 0.1 ms]
  Range (min … max):     0.0 ms …   4.6 ms    3140 runs

Summary
  'cd /tmp/with_cache' ran
    1.13 ± 3.65 times faster than 'cd /tmp/without_cache'

Code

Shellcheck

# shellcheck disable=SC2155
# shellcheck disable=SC1090

Main entry point

use_cache() {
    [[ -v DIRENV_CACHE_IGNORE ]] && {
        _debug "Ignoring cache, DIRENV_CACHE_IGNORE is set"
        return
    }
    [[ ${DIRENV_CACHE_DEBUG:-0} -gt 1 ]] && {
        set_x
        set -uo pipefail
    }
    local cache_filename=${1:-.env}
    local cache_file=$(get_cache_file "$cache_filename")

    # if cache exists and nonzero
    if [[ -s "$cache_file" ]]; then
        # Load preemptively
        load_cache "$cache_file"
        # Then verify (and reload if necessary)
        verify_cache "$cache_file"
    else
        _debug "Rebuilding cache: ${cache_file} missing or zero"
        build_and_load_cache "$cache_file"
    fi
    exit $?
}

Get cache file

get_cache_file() {
    # Ensure the cache file is in the same directory as the RC file
    local cache_filename=${1:?"Cache filename is required"}
    local rcfile=$(find_up ".envrc")
    builtin echo -n "${rcfile%%/.*}/$cache_filename"
}

Cache validity

verify_cache () {
    local cache_file=${1:?"Cache file required"}

    # runs direnv current for all .Path in $DIRENV_WATCHES (in parallel)
    # xargs will return 0 only if the command is successful for all inputs
    direnv show_dump "$DIRENV_WATCHES" | jq -r '.[]|.Path' | xargs -n1 -P0 direnv current
    local status=$?
    if [[ $status -gt 0 ]]; then
        _debug "Cache is stale, rebuilding"
        build_and_load_cache "$cache_file"
    fi
}

Build cache

build_cache() {
    local cache_file=${1:?"Cache file required"}
    if [[ -v DIRENV_CACHE_DEBUG ]]; then
        local stderr_file=$(mktemp)
    else
        local stderr_file=/dev/null
    fi

    # we use json/jq because the bash export uses $'' c-strings which are not
    # easy to get rid of with sed
    # DIRENV_LOG_FORMAT='' will turn off direnv logging
    # DIRENV_CACHE_IGNORE=1 so that we can build the cache without using it
    local cache_contents=$(
        set -o pipefail
        env DIRENV_CACHE_IGNORE=1 DIRENV_LOG_FORMAT="" direnv export json 2>"$stderr_file" | jq -r 'to_entries | map("export \(.key)=\(.value|@sh)")[]'
    )

    local status=$?
    if [[ -v DIRENV_CACHE_DEBUG ]]; then
        local stderr_content=$(<"$stderr_file") && rm "$stderr_file"
    else
        local stderr_content=""
    fi
    if [[ $status -eq 0 ]]; then
        _debug "Built cache: ${cache_file} contents: <${cache_contents}> stderr: <$stderr_content>"
        builtin echo -n "$cache_contents" >"$cache_file" || _debug "Cache build failed while writing to $cache_file"
        return
    else
        _debug "Cache build failed: $stderr_content"
        return $status
    fi
}

Load cache

load_cache() {
    local cache_file=${1:?"Cache file required"}
    # we could use dotenv instead, but we don't need `watch_file`, and this is compatible?
    source "$cache_file" || {
        _debug "Cache load failed: $cache_file"
        exit $?
    }
    _debug "Loaded from cache $cache_file"
}

build_and_load

build_and_load_cache() {
    local cache_file=${1:?"Cache file required"}
    build_cache "$cache_file" || {
        _debug "Cache build failed"
        exit $?
    }
    load_cache "$cache_file"
}

Debug printing

_debug() {
    # Return status of this function is always the previous status.
    #
    # Prints $1 if DIRENV_CACHE_DEBUG is set. (Note that you probably have to
    # ~export~ it, not just set it, since all this code runs in a subshell)

    {
        local status=$?
        [[ -o xtrace ]] && {
            shopt -uo xtrace
            local xtrace_was_on=1
        }
    } 2>/dev/null

    local msg=${1:?"Message required"}
    [[ -v DIRENV_CACHE_DEBUG ]] && echo "$msg (status: $status)" >&2

    {
        [[ ${xtrace_was_on:-0} -eq 1 ]] && shopt -so xtrace
        return $status
    } 2>/dev/null
}

Emacs local variables

# Local Variables:
# sh-shell: bash
# End: