Surface action cache hit rate
Opened this issue · 2 comments
saraadams commented
Problem
Surface the action cache hit rate, in particular if remote caching is used.
Suggested solution
The following events may help detect these:
- no cache hit:
check cache hit
(categoryremote action cache check
) within eventActionContinuation.execute
(categorygeneral information
), thereafter execution- remote execution: potentially
upload missing inputs
(categoryRemote execution upload time
) followed byexecute remotely
(categoryremote action execution
) - remote cache only: TODO
- disk cache: not included in profile?
- remote execution: potentially
- cache hit:
check cache hit
within eventActionContinuation.execute
, no execution thereafter
DataProvider
to provide rate and/or absolute numbers (cache checks, successful cache checks)
SuggestionProvider
to suggest strategies to increase the cache hit rate, e.g. --incompatible_strict_action_env
saraadams commented
If latency is high, then having many parallel check cache hit actions can help speed up getting remote cache hits (as the jobs are idle due to high latency).
An estimated latency might be extracted by looking for the shortest check cache hit
entry.
Increasing --jobs to above your machine's # of cores could help.
saraadams commented
- All actions that weren't in the internal action cache:
- event with category
action processing
- event with category
- All events that check a remote cache (
disk_cache
orremote_cache
)- event with category
remote action cache check
- event with category
- Cache hits also have a related event of category
remote output download
, but not any events indicating remote execution (e.g. of categoryremote action execution
) - Cache misses should have a related event for execution (local or remote)