Use locking to avoid redundant invocations of the backing function

Question

Use locking to avoid redundant invocations of the backing function

dimo414 opened this issue 6 years ago · 7 comments

Original report by dabest1 (Bitbucket: Dimitriy).

When doing a function call in parallel from multiple script executions, cache is not used.

It would be nice to add a feature, where first call to the function would create a lock and prepare a cache, then allow the second call to proceed once the cache is available.

Answer 1 · 2018-10-17T01:55:00.000Z

Original comment by Michael Diamond (Bitbucket: dimo414).

Thanks for the feature request! Note that the cache is used when called by multiple scripts, as long as $ is not specified as an environment variable to key off. Depending on your specific use-case you might using consider bc::warm::your_function to ensure the cache is warm before either script calls your_function.

Do you have any thoughts as to how such locking should work? In my opinion the existing behavior (cache either exists or doesn't, gated on an atomic filesystem operation) is fairly clean, so I'd be hesitant to implement some sort of locking if it makes things significantly more complex.

Could you share some more details about your use case?

Answer 2 · 2018-10-17T21:51:25.000Z

Original comment by Michael Diamond (Bitbucket: dimo414).

Thanks for the details and pointer; I can see creating a bc::locked_cache function which uses similar semantics.

Answer 3 · 2018-10-17T23:04:16.000Z

Original comment by dabest1 (Bitbucket: Dimitriy).

Great!

Answer 4 · 2018-10-30T16:37:43.000Z

Original comment by Michael Diamond (Bitbucket: dimo414).

After some more consideration, I think you may prefer to implement this locking yourself. For one, it's difficult to write a proper mutex (one that blocks while the lock is held) in a platform-independent way. For another, you can likely implement a more precise mutex-curl function yourself. Something like this would work:

# $1 is URL, $2 is path to save to, $3 is the TTL, e.g. '1 minute'
mutex-curl() {
  if (( $# < 3 )); then
    echo "Usage: mutex-curl URL PATH TTL [CURL_ARGS]" >&2
    return 1
  fi

  local url=$1
  local path=$2
  local ttl=$3
  shift; shift; shift

  local fd
  (
    flock $fd
    # ignore file if older than TTL
    if [[ -e "$path" ]] && \
        [[ -z "$(find "$path" -not -newermt "-${ttl}" 2>/dev/null)" ]]; then
      return
    fi
    curl "$url" --output "$path" "$@"
  ) {fd}> "${path}.lock"
}

You can then more precisely control how long the file should be persisted locally, and could optionally invoke mutex-curl in a bc::cache-decorated function, e.g.:

process-curled-file() {
  mutex-curl "http://example.com/foo" "/tmp/foo" '10 minutes' || {
    echo "Failed to download foo" >&2
    return 1
  }
  grep data-i-care-about /tmp/foo
} && bc::cache process-curled-file

That said, I'm still thinking about adding bc::locked_cache, however I'd likely just use flock and simply document that it's a best-effort feature and not supported on all platforms.

Some more resources:

How can I ensure that only one instance of a script is running at a time
man flock
should I use flock or mkdir?
flock example with dynamic file descriptor
http://mywiki.wooledge.org/ProcessManagement - "you should seriously consider rewriting the functionality you require in a more powerful language."

Answer 5 · 2018-10-30T20:34:28.000Z

Original comment by dabest1 (Bitbucket: Dimitriy).

Thank you for reviewing this and for the suggestions. I would still love to see the generic bc::locked_cache implemented. This is what I normally look for in caching tools, and usually find that this functionality is missing.

Answer 6 · 2018-10-31T01:27:57.000Z

Original comment by Michael Diamond (Bitbucket: dimo414).

Agreed, it would be nice to add (and feel free to take a crack at it yourself, if you'd like to send a PR). Is there a reason mutex-curl or something like it doesn't work for your use case?

Answer 7 · 2018-11-16T06:53:20.000Z

Original comment by Michael Diamond (Bitbucket: dimo414).

Introduce bc::locked_cache - resolves issue #14.