dimo414/bash-cache

Use locking to avoid redundant invocations of the backing function

dimo414 opened this issue · 7 comments

Original report by dabest1 (Bitbucket: Dimitriy).


When doing a function call in parallel from multiple script executions, cache is not used.

It would be nice to add a feature, where first call to the function would create a lock and prepare a cache, then allow the second call to proceed once the cache is available.

Original comment by Michael Diamond (Bitbucket: dimo414).


Thanks for the feature request! Note that the cache is used when called by multiple scripts, as long as $ is not specified as an environment variable to key off. Depending on your specific use-case you might using consider bc::warm::your_function to ensure the cache is warm before either script calls your_function.

Do you have any thoughts as to how such locking should work? In my opinion the existing behavior (cache either exists or doesn't, gated on an atomic filesystem operation) is fairly clean, so I'd be hesitant to implement some sort of locking if it makes things significantly more complex.

Could you share some more details about your use case?

Original comment by Michael Diamond (Bitbucket: dimo414).


Thanks for the details and pointer; I can see creating a bc::locked_cache function which uses similar semantics.

Original comment by dabest1 (Bitbucket: Dimitriy).


Great!

Original comment by Michael Diamond (Bitbucket: dimo414).


After some more consideration, I think you may prefer to implement this locking yourself. For one, it's difficult to write a proper mutex (one that blocks while the lock is held) in a platform-independent way. For another, you can likely implement a more precise mutex-curl function yourself. Something like this would work:

# $1 is URL, $2 is path to save to, $3 is the TTL, e.g. '1 minute'
mutex-curl() {
  if (( $# < 3 )); then
    echo "Usage: mutex-curl URL PATH TTL [CURL_ARGS]" >&2
    return 1
  fi

  local url=$1
  local path=$2
  local ttl=$3
  shift; shift; shift

  local fd
  (
    flock $fd
    # ignore file if older than TTL
    if [[ -e "$path" ]] && \
        [[ -z "$(find "$path" -not -newermt "-${ttl}" 2>/dev/null)" ]]; then
      return
    fi
    curl "$url" --output "$path" "$@"
  ) {fd}> "${path}.lock"
}

You can then more precisely control how long the file should be persisted locally, and could optionally invoke mutex-curl in a bc::cache-decorated function, e.g.:

process-curled-file() {
  mutex-curl "http://example.com/foo" "/tmp/foo" '10 minutes' || {
    echo "Failed to download foo" >&2
    return 1
  }
  grep data-i-care-about /tmp/foo
} && bc::cache process-curled-file

That said, I'm still thinking about adding bc::locked_cache, however I'd likely just use flock and simply document that it's a best-effort feature and not supported on all platforms.


Some more resources:

Original comment by dabest1 (Bitbucket: Dimitriy).


Thank you for reviewing this and for the suggestions. I would still love to see the generic bc::locked_cache implemented. This is what I normally look for in caching tools, and usually find that this functionality is missing.

Original comment by Michael Diamond (Bitbucket: dimo414).


Agreed, it would be nice to add (and feel free to take a crack at it yourself, if you'd like to send a PR). Is there a reason mutex-curl or something like it doesn't work for your use case?

Original comment by Michael Diamond (Bitbucket: dimo414).


Introduce bc::locked_cache - resolves issue #14.