Use locking to avoid redundant invocations of the backing function
dimo414 opened this issue · 7 comments
Original report by dabest1 (Bitbucket: Dimitriy).
When doing a function call in parallel from multiple script executions, cache is not used.
It would be nice to add a feature, where first call to the function would create a lock and prepare a cache, then allow the second call to proceed once the cache is available.
Original comment by Michael Diamond (Bitbucket: dimo414).
Thanks for the feature request! Note that the cache is used when called by multiple scripts, as long as $
is not specified as an environment variable to key off. Depending on your specific use-case you might using consider bc::warm::your_function
to ensure the cache is warm before either script calls your_function
.
Do you have any thoughts as to how such locking should work? In my opinion the existing behavior (cache either exists or doesn't, gated on an atomic filesystem operation) is fairly clean, so I'd be hesitant to implement some sort of locking if it makes things significantly more complex.
Could you share some more details about your use case?
Original comment by Michael Diamond (Bitbucket: dimo414).
Thanks for the details and pointer; I can see creating a bc::locked_cache
function which uses similar semantics.
Original comment by Michael Diamond (Bitbucket: dimo414).
After some more consideration, I think you may prefer to implement this locking yourself. For one, it's difficult to write a proper mutex (one that blocks while the lock is held) in a platform-independent way. For another, you can likely implement a more precise mutex-curl
function yourself. Something like this would work:
# $1 is URL, $2 is path to save to, $3 is the TTL, e.g. '1 minute'
mutex-curl() {
if (( $# < 3 )); then
echo "Usage: mutex-curl URL PATH TTL [CURL_ARGS]" >&2
return 1
fi
local url=$1
local path=$2
local ttl=$3
shift; shift; shift
local fd
(
flock $fd
# ignore file if older than TTL
if [[ -e "$path" ]] && \
[[ -z "$(find "$path" -not -newermt "-${ttl}" 2>/dev/null)" ]]; then
return
fi
curl "$url" --output "$path" "$@"
) {fd}> "${path}.lock"
}
You can then more precisely control how long the file should be persisted locally, and could optionally invoke mutex-curl
in a bc::cache
-decorated function, e.g.:
process-curled-file() {
mutex-curl "http://example.com/foo" "/tmp/foo" '10 minutes' || {
echo "Failed to download foo" >&2
return 1
}
grep data-i-care-about /tmp/foo
} && bc::cache process-curled-file
That said, I'm still thinking about adding bc::locked_cache
, however I'd likely just use flock
and simply document that it's a best-effort feature and not supported on all platforms.
Some more resources:
- How can I ensure that only one instance of a script is running at a time
man flock
- should I use
flock
ormkdir
? flock
example with dynamic file descriptor- http://mywiki.wooledge.org/ProcessManagement - "you should seriously consider rewriting the functionality you require in a more powerful language."
Original comment by dabest1 (Bitbucket: Dimitriy).
Thank you for reviewing this and for the suggestions. I would still love to see the generic bc::locked_cache implemented. This is what I normally look for in caching tools, and usually find that this functionality is missing.