"Cache not found" but "Cache already exists"
3flex opened this issue ยท 32 comments
See the "Cache Gradle dependency cache" step for each job, where it says:
Cache not found for input keys: Windows-gradle-...
But then in each "Post Cache Gradle dependency cache" step, it says:
[warning]Cache already exists. Scope: refs/heads/gh_actions, Key: Windows-gradle-...
So, cache wasn't found (which was expected for this build) but none of the jobs were able to save the cache either? I saw in a comment on another issue that multiple jobs in a matrix with the same cache key will cause a race and the first to finish will save results to the cache which is OK for what I'm doing, but that hasn't happened - none of the jobs saved the result as expected.
It looks like some of the runners failed with a tar error:
C:\windows\System32\tar.exe -cz -f d:\a\_temp\2872e9c9-c5e0-4d11-b84e-7581c522ad03\cache.tgz -C d:\a\detekt\detekt\$HOME\.gradle\caches\modules-2 .
tar.exe: could not chdir to 'd:\a\detekt\detekt\$HOME\.gradle\caches\modules-2'
[warning]Tar failed with error: The process 'C:\windows\System32\tar.exe' failed with exit code 1. Ensure BSD tar is installed and on the PATH.
This failure occurred after that runner "reserved" the cache. The issue now is that that cache is in the "reserved" state, which is preventing other runners from uploading to the cache. The "reserved" state is automatically cleared once a day, however, we'll need to fix the tar issue or you'll continue to see this warning.
@joshmgross Was there any fix for having multiple tar commands on the path?
I'm running into same issue using actions/cache@v1.
Run actions/cache@v1
with:
path: toolchain
key: toolchain-b5bb1ca45a199ba1026d31f7995b3f04bbae726b0390b2f6b2c67bdf7d98183e
Cache not found for input keys: toolchain-b5bb1ca45a199ba1026d31f7995b3f04bbae726b0390b2f6b2c67bdf7d98183e.
Post job cleanup.
[warning]Cache already exists. Scope: refs/heads/master, Key: toolchain-b5bb1ca45a199ba1026d31f7995b3f04bbae726b0390b2f6b2c67bdf7d98183e, Version: (null)
I think I triggered this bug by setting the path to a single file instead of a folder.
I'm getting this error too. Was experimenting with this feature and a mistake caused an action fail with a tar error. Now it seems there's no way to fix it. Is there at least a workaround?
@hiranya911 Use a different key.
Per https://help.github.com/en/actions/automating-your-workflow-with-github-actions/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy, you have other choices:
- Wait 7 days, and don't use that cache entry.
- Put some garbage (2GiB) into another cache entry, so that GitHub should evict old cache entries.
@zhangyoufu I've already tried several different keys. For each key I get a not found
at the initialization and an already exists
at the post cleanup.
Hi @hiranya911, could you please link me to the repo if it is public. If you're getting a tar error each time, it'll continue to get stuck in the bad state.
I'm not getting a tar error. That only happened once. This is the repo: https://github.com/firebase/firebase-admin-python/actions
Thanks, I'm taking a look. Can you also please enable debugging and trigger another run?
Here's one of the recent builds with the issue: https://github.com/firebase/firebase-admin-python/commit/a9c14a05d3d45a7a88a585120079cdc1d8af00f8/checks?check_suite_id=401125647
And here's the run with the tar error: https://github.com/firebase/firebase-admin-python/commit/3125a039dde2cc41ac79e616d4fd63ffbccb773e/checks?check_suite_id=401104866
@hiranya911 Thanks for providing the requested info. I took a look and both issues are related to how runners "reserve" the cache.
-
The run using
firebase-cli
as the key worked successfully. Essentially, one of the runners of your matrix strategy reserves the right to upload the cache. If you look at the post cache step inbuild (3.6)
, you'll see that step succeeded in creating the cache. The others fail with the message that the cache already exists becausebuild (3.6)
created the cache. -
In the run with the tar error, the cache gets into a bad state because it already reserved the cache but failed to create the tar. The workaround is to either use a different key or wait the approximately 24 hours for the reservation to expire. We reserve the cache this way to prevent having many runners (when a matrix strategy is used) all tarring and uploading the identical cache. We are working on a fix to reduce the time a cache stays reserved, which should help alleviate this issue.
Unfortunately, there's not a good way right now to distinguish these cases, except to look at the post cache step in each runner to see if one succeeded. Hope this helps :)
Thanks @dhadka for looking into it. You're indeed right. I finally managed to get a cache hit with the firebase-cli key: https://github.com/firebase/firebase-admin-python/runs/390362670
It looks like Linux-Firebase-CLI key is currently in the broken reserved state, so it won't work until 24 hours has passed.
It also looks like, content under $HOME are indeed cacheable:
Post job cleanup.
/bin/tar -cz -f /home/runner/work/_temp/0b50a961-b830-4502-b18b-7ef90a419b14/cache.tgz -C /home/runner/.cache .
Cache saved successfully
I think I triggered this bug by setting the path to a single file instead of a folder.
Same.
@joshmgross Why is this an enhancement instead of a bug?
@thisismydesign Updated to bug, as it's an issue with how we clean up reserved caches.
Issue
Caches are reserved before creating the tar
file to upload. This prevents unnecessary time spent creating a cache file when the cache is being created.
When creating the cache fails, the cache for a given key is stuck in a reserved state for up to a day, resulting in no cache found and the cache "already exists"
These failures could be network issues while uploading the cache or tar
failures such as an invalid directory, specifying a single file (tracked with #33), or permissions errors.
Possible Fixes
- Allow the client (the cache action) to un-reserve a cache when a failure occurs before the cache is completed
- Shorten the timespan to clean up reserved caches that are not completed
Workaround
Changing your key will allow you to create a new cache
๐ข We are also having this issue. It adds extra 5 minutes build time which is painful.
I've had this occur after a job failed to save the cache entry because there was no directory at the specified path. Following jobs reported "cache not found" for the input key in the restore step, and "cache already exists" in the post step.
+1
I get this in one round (for MacOS):
Post Run actions/cache@v228s
Cache saved successfully
Post job cleanup.
/usr/bin/tar --use-compress-program zstd -T0 -cf cache.tzst -P -C /Users/runner/work/stone/stone --files-from manifest.txt
Cache saved successfully
Then I get this when the next workflow starts:
Run actions/cache@v2
Cache not found for input keys: x-macOS-cache
Why isn't it working? My config file looks like this:
#
# Name.
#
name: build
#
# Triggers.
#
on:
pull_request:
paths-ignore:
- '**.md'
push:
branches:
- build
paths-ignore:
- '**.md'
#
# Jobs.
#
jobs:
linux:
runs-on: ubuntu-16.04
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 14.x
- uses: actions/cache@v2
with:
path: /usr/share
key: z-${{ runner.os }}-cache
- run: sudo apt-get update
- run: sudo apt-get install build-essential
- run: sudo apt-get install graphicsmagick
- run: sudo apt-get install texlive-xetex
- run: sudo apt-get install imagemagick
- run: sudo apt-get install libreoffice
- run: sudo apt-get install fontforge
- run: sudo apt-get install ffmpeg
- run: npm install
- run: npm test
windows:
runs-on: windows-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 14.x
- uses: actions/cache@v2
with:
path: C:\ProgramData
key: x-${{ runner.os }}-cache
- run: choco install libreoffice-fresh
- run: choco install graphicsmagick
- run: choco install imagemagick
- run: choco install fontforge
- run: choco install ffmpeg
- run: choco install miktex
- run: npm install
- run: npm test
macos:
runs-on: macos-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-node@v1
with:
node-version: 14.x
- uses: actions/cache@v2
with:
path: /usr/local
key: x-${{ runner.os }}-cache
- run: brew cask install java
- run: brew cask install mactex
- run: brew cask install libreoffice
- run: brew install graphicsmagick
- run: brew install imagemagick
- run: brew install fontforge
- run: brew install ffmpeg
- run: npm install
- run: npm test
Any help would be greatly appreciated, the build times are upward of 20 minutes in every push in my case, and I just started learning about this today. Would like to have some caching.
@lancejpollard Is this a public repo? If so, can you please link to the workflow?
Please also enable debug logging, re-run the workflow, and paste the logs here for the cache creation and restore steps. Thanks!
@lancejpollard Thanks for the link!
It looks like you're hitting errors when trying to create / save the cache. For MacOS (link):
Post job cleanup.
/usr/bin/tar --use-compress-program zstd -T0 -cf cache.tzst -P -C /Users/runner/work/stone/stone --files-from manifest.txt
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: ../../../../../usr/local/miniconda/pkgs/zlib-1.2.11-h1de35cc_3jfy28ko5: Couldn't visit directory: Unknown error: -1
tar: Error exit delayed from previous errors.
[warning]Tar failed with error: The process '/usr/bin/tar' failed with exit code 1
And on Linux (link):
/bin/tar --use-compress-program zstd -T0 -cf cache.tzst -P -C /home/runner/work/stone/stone --files-from manifest.txt
[warning]Cache size of ~16116 MB (16898410712 B) is over the 5GB limit, not saving cache.
When it hits these errors, you then experience the "cache not found" but "cache already exists" issue because the cache was created but the upload never completed. You can workaround this by changing the key, but from the looks of it you'll just hit these errors again.
My recommendation is to look at caching the subset of folders you need instead of /usr/local
and /usr/share
. The hosted runners come with a lot of preinstalled software so those folders can already be quite large and have different permissions.
This is also a scenario where self hosted runners is a good fit, as you can setup a VM with all of the software you need pre-installed.
The cache is scoped to the running branch and the main branch.
So, the trick is run the ci also on main branch push. Once running on the main branch, the cache will be shared with the PR branch. So after some research, the fix was just set the build to run on pull request and the push to main branch:
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
I have a similar issue. On my system it seems, that caches are bound to an OS. We have implemented the following pipeline using matrix build parallelization :
linux+java11 -\ /-> linux+java11
linux+java8 -> linux+java8 -> linux+java8
win+java8 -/ \-> win+java8
win+java11 -/ \-> win+java11
the jobs on the right should all restore a cache created by the middle job.
Anyhow, what's interesting is, that the linux jobs on the right get a cache hit while the windows builds on the right cannot find the cache of the middle job. Is this by intention, even considered to be a different bug, or simply related to this issue?
This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.
I imagine this is still relevant..
This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.
Anyone who has recently faced this can they share the workflow? Need to reproduce this to investigate. Logs of workflow run should also work
I have a similar issue. On my system it seems, that caches are bound to an OS. We have implemented the following pipeline using matrix build parallelization :
linux+java11 -\ /-> linux+java11 linux+java8 -> linux+java8 -> linux+java8 win+java8 -/ \-> win+java8 win+java11 -/ \-> win+java11
the jobs on the right should all restore a cache created by the middle job. Anyhow, what's interesting is, that the linux jobs on the right get a cache hit while the windows builds on the right cannot find the cache of the middle job. Is this by intention, even considered to be a different bug, or simply related to this issue?
@maybeec This feels different. You are pointing out that linux
and macos
caches are compatible and windows
are not. Work is going in that area if you like to track: #984
Anyone who has recently faced this can they share the workflow? Need to reproduce this to investigate. Logs of workflow run should also work
https://github.com/MODFLOW-USGS/modflow6/actions/runs/3624583228/jobs/6111766806#step:6:152
That run uses a fork of actions/cache
offering finer-grained control but presumably encounters the same issue
@w-bonelli Was the cache already present as I don't see a Cache not found
warning? If it was already present then this message is correct.
Update: I also see that the workflow is using save only action. And this message is possible with save only action when cache exists with same key
Closing this as this issue seems resolved with newer versions. I am not able to reproduce this.
Is it no longer possible for the cache to get stuck in a reserved state after a failed upload?
Would you mind helping me better understand what's going on when there is a Cache not found
but then the cache can't save?