"Cache not found" but "Cache already exists"

Question

"Cache not found" but "Cache already exists"

3flex opened this issue 5 years ago · 32 comments

3flex commented 5 years ago

https://github.com/3flex/detekt/commit/5cf02d5164a6cff162ea494f6e3734759863fbd6/checks?check_suite_id=391438197

See the "Cache Gradle dependency cache" step for each job, where it says:

Cache not found for input keys: Windows-gradle-...

But then in each "Post Cache Gradle dependency cache" step, it says:

[warning]Cache already exists. Scope: refs/heads/gh_actions, Key: Windows-gradle-...

So, cache wasn't found (which was expected for this build) but none of the jobs were able to save the cache either? I saw in a comment on another issue that multiple jobs in a matrix with the same cache key will cause a race and the first to finish will save results to the cache which is OK for what I'm doing, but that hasn't happened - none of the jobs saved the result as expected.

smolkaj commented 4 years ago

+1

Answer 1 · 2020-01-09T15:09:20.000Z

It looks like some of the runners failed with a tar error:

C:\windows\System32\tar.exe -cz -f d:\a\_temp\2872e9c9-c5e0-4d11-b84e-7581c522ad03\cache.tgz -C d:\a\detekt\detekt\$HOME\.gradle\caches\modules-2 .
tar.exe: could not chdir to 'd:\a\detekt\detekt\$HOME\.gradle\caches\modules-2'

[warning]Tar failed with error: The process 'C:\windows\System32\tar.exe' failed with exit code 1. Ensure BSD tar is installed and on the PATH.

This failure occurred after that runner "reserved" the cache. The issue now is that that cache is in the "reserved" state, which is preventing other runners from uploading to the cache. The "reserved" state is automatically cleared once a day, however, we'll need to fix the tar issue or you'll continue to see this warning.

@joshmgross Was there any fix for having multiple tar commands on the path?

Answer 2 · 2020-01-14T13:34:06.000Z

I'm running into same issue using actions/cache@v1.

Run actions/cache@v1
  with:
    path: toolchain
    key: toolchain-b5bb1ca45a199ba1026d31f7995b3f04bbae726b0390b2f6b2c67bdf7d98183e
Cache not found for input keys: toolchain-b5bb1ca45a199ba1026d31f7995b3f04bbae726b0390b2f6b2c67bdf7d98183e.

Post job cleanup.
[warning]Cache already exists. Scope: refs/heads/master, Key: toolchain-b5bb1ca45a199ba1026d31f7995b3f04bbae726b0390b2f6b2c67bdf7d98183e, Version: (null)

I think I triggered this bug by setting the path to a single file instead of a folder.

Answer 3 · 2020-01-15T01:55:10.000Z

I'm getting this error too. Was experimenting with this feature and a mistake caused an action fail with a tar error. Now it seems there's no way to fix it. Is there at least a workaround?

Answer 4 · 2020-01-15T01:59:21.000Z

@hiranya911 Use a different key.

Per https://help.github.com/en/actions/automating-your-workflow-with-github-actions/caching-dependencies-to-speed-up-workflows#usage-limits-and-eviction-policy, you have other choices:

Wait 7 days, and don't use that cache entry.
Put some garbage (2GiB) into another cache entry, so that GitHub should evict old cache entries.

Answer 5 · 2020-01-15T02:01:31.000Z

@zhangyoufu I've already tried several different keys. For each key I get a not found at the initialization and an already exists at the post cleanup.

Answer 6 · 2020-01-15T02:03:05.000Z

Hi @hiranya911, could you please link me to the repo if it is public. If you're getting a tar error each time, it'll continue to get stuck in the bad state.

Answer 7 · 2020-01-15T02:04:02.000Z

I'm not getting a tar error. That only happened once. This is the repo: https://github.com/firebase/firebase-admin-python/actions

Answer 8 · 2020-01-15T02:06:54.000Z

Thanks, I'm taking a look. Can you also please enable debugging and trigger another run?

Answer 9 · 2020-01-15T02:06:59.000Z

Here's one of the recent builds with the issue: https://github.com/firebase/firebase-admin-python/commit/a9c14a05d3d45a7a88a585120079cdc1d8af00f8/checks?check_suite_id=401125647

And here's the run with the tar error: https://github.com/firebase/firebase-admin-python/commit/3125a039dde2cc41ac79e616d4fd63ffbccb773e/checks?check_suite_id=401104866

Answer 10 · 2020-01-15T02:25:11.000Z

@hiranya911 Thanks for providing the requested info. I took a look and both issues are related to how runners "reserve" the cache.

The run using firebase-cli as the key worked successfully. Essentially, one of the runners of your matrix strategy reserves the right to upload the cache. If you look at the post cache step in build (3.6), you'll see that step succeeded in creating the cache. The others fail with the message that the cache already exists because build (3.6) created the cache.
In the run with the tar error, the cache gets into a bad state because it already reserved the cache but failed to create the tar. The workaround is to either use a different key or wait the approximately 24 hours for the reservation to expire. We reserve the cache this way to prevent having many runners (when a matrix strategy is used) all tarring and uploading the identical cache. We are working on a fix to reduce the time a cache stays reserved, which should help alleviate this issue.

Unfortunately, there's not a good way right now to distinguish these cases, except to look at the post cache step in each runner to see if one succeeded. Hope this helps :)

Answer 11 · 2020-01-15T03:47:14.000Z

Thanks @dhadka for looking into it. You're indeed right. I finally managed to get a cache hit with the firebase-cli key: https://github.com/firebase/firebase-admin-python/runs/390362670

It looks like Linux-Firebase-CLI key is currently in the broken reserved state, so it won't work until 24 hours has passed.

It also looks like, content under $HOME are indeed cacheable:

Post job cleanup.
/bin/tar -cz -f /home/runner/work/_temp/0b50a961-b830-4502-b18b-7ef90a419b14/cache.tgz -C /home/runner/.cache .
Cache saved successfully

Answer 12 · 2020-01-28T01:45:29.000Z

I think I triggered this bug by setting the path to a single file instead of a folder.

Same.

@joshmgross Why is this an enhancement instead of a bug?

Answer 13 · 2020-01-28T16:04:59.000Z

@thisismydesign Updated to bug, as it's an issue with how we clean up reserved caches.

Issue

Caches are reserved before creating the tar file to upload. This prevents unnecessary time spent creating a cache file when the cache is being created.

When creating the cache fails, the cache for a given key is stuck in a reserved state for up to a day, resulting in no cache found and the cache "already exists"

These failures could be network issues while uploading the cache or tar failures such as an invalid directory, specifying a single file (tracked with #33), or permissions errors.

Possible Fixes

Allow the client (the cache action) to un-reserve a cache when a failure occurs before the cache is completed
Shorten the timespan to clean up reserved caches that are not completed

Workaround

Changing your key will allow you to create a new cache

Answer 14 · 2020-02-28T22:56:53.000Z

😢 We are also having this issue. It adds extra 5 minutes build time which is painful.

Answer 15 · 2020-05-21T23:26:57.000Z

I've had this occur after a job failed to save the cache entry because there was no directory at the specified path. Following jobs reported "cache not found" for the input key in the restore step, and "cache already exists" in the post step.

Answer 16 · 2020-07-23T21:42:23.000Z

I get this in one round (for MacOS):

Post Run actions/cache@v228s
Cache saved successfully
Post job cleanup.
/usr/bin/tar --use-compress-program zstd -T0 -cf cache.tzst -P -C /Users/runner/work/stone/stone --files-from manifest.txt
Cache saved successfully

Then I get this when the next workflow starts:

Run actions/cache@v2
Cache not found for input keys: x-macOS-cache

Why isn't it working? My config file looks like this:

#
# Name.
#

name: build

#
# Triggers.
#

on:
  pull_request:
    paths-ignore:
      - '**.md'
  push:
    branches:
      - build
    paths-ignore:
      - '**.md'

#
# Jobs.
#

jobs:
  linux:
    runs-on: ubuntu-16.04
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v1
        with:
          node-version: 14.x
      - uses: actions/cache@v2
        with:
          path: /usr/share
          key: z-${{ runner.os }}-cache
      - run: sudo apt-get update
      - run: sudo apt-get install build-essential
      - run: sudo apt-get install graphicsmagick
      - run: sudo apt-get install texlive-xetex
      - run: sudo apt-get install imagemagick
      - run: sudo apt-get install libreoffice
      - run: sudo apt-get install fontforge
      - run: sudo apt-get install ffmpeg
      - run: npm install
      - run: npm test

  windows:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v1
        with:
          node-version: 14.x
      - uses: actions/cache@v2
        with:
          path: C:\ProgramData
          key: x-${{ runner.os }}-cache
      - run: choco install libreoffice-fresh
      - run: choco install graphicsmagick
      - run: choco install imagemagick
      - run: choco install fontforge
      - run: choco install ffmpeg
      - run: choco install miktex
      - run: npm install
      - run: npm test

  macos:
    runs-on: macos-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-node@v1
        with:
          node-version: 14.x
      - uses: actions/cache@v2
        with:
          path: /usr/local
          key: x-${{ runner.os }}-cache
      - run: brew cask install java
      - run: brew cask install mactex
      - run: brew cask install libreoffice
      - run: brew install graphicsmagick
      - run: brew install imagemagick
      - run: brew install fontforge
      - run: brew install ffmpeg
      - run: npm install
      - run: npm test

Any help would be greatly appreciated, the build times are upward of 20 minutes in every push in my case, and I just started learning about this today. Would like to have some caching.

Answer 17 · 2020-07-23T22:03:40.000Z

@lancejpollard Is this a public repo? If so, can you please link to the workflow?

Please also enable debug logging, re-run the workflow, and paste the logs here for the cache creation and restore steps. Thanks!

Answer 18 · 2020-07-24T03:25:41.000Z

@dhadka https://github.com/mountbuild/stone/actions

Answer 19 · 2020-07-24T13:38:24.000Z

@lancejpollard Thanks for the link!

It looks like you're hitting errors when trying to create / save the cache. For MacOS (link):

Post job cleanup.
/usr/bin/tar --use-compress-program zstd -T0 -cf cache.tzst -P -C /Users/runner/work/stone/stone --files-from manifest.txt
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Permission denied
tar: ../../../../../usr/local/miniconda/pkgs/zlib-1.2.11-h1de35cc_3jfy28ko5: Couldn't visit directory: Unknown error: -1
tar: Error exit delayed from previous errors.
[warning]Tar failed with error: The process '/usr/bin/tar' failed with exit code 1

And on Linux (link):

/bin/tar --use-compress-program zstd -T0 -cf cache.tzst -P -C /home/runner/work/stone/stone --files-from manifest.txt
[warning]Cache size of ~16116 MB (16898410712 B) is over the 5GB limit, not saving cache.

When it hits these errors, you then experience the "cache not found" but "cache already exists" issue because the cache was created but the upload never completed. You can workaround this by changing the key, but from the looks of it you'll just hit these errors again.

My recommendation is to look at caching the subset of folders you need instead of /usr/local and /usr/share. The hosted runners come with a lot of preinstalled software so those folders can already be quite large and have different permissions.

This is also a scenario where self hosted runners is a good fit, as you can setup a VM with all of the software you need pre-installed.

Answer 20 · 2020-12-29T16:20:45.000Z

The cache is scoped to the running branch and the main branch.
So, the trick is run the ci also on main branch push. Once running on the main branch, the cache will be shared with the PR branch. So after some research, the fix was just set the build to run on pull request and the push to main branch:

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

Answer 21 · 2021-05-07T12:31:45.000Z

I have a similar issue. On my system it seems, that caches are bound to an OS. We have implemented the following pipeline using matrix build parallelization :

linux+java11   -\                        /->  linux+java11
linux+java8      ->  linux+java8  ->    linux+java8
win+java8      -/                        \->   win+java8
win+java11   -/                          \->  win+java11

the jobs on the right should all restore a cache created by the middle job.
Anyhow, what's interesting is, that the linux jobs on the right get a cache hit while the windows builds on the right cannot find the cache of the middle job. Is this by intention, even considered to be a different bug, or simply related to this issue?

Answer 22 · 2022-05-07T08:36:14.000Z

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

Answer 23 · 2022-05-09T09:18:54.000Z

I imagine this is still relevant..

Answer 24 · 2022-11-28T08:34:56.000Z

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

Answer 25 · 2022-11-28T08:35:55.000Z

Not stale

…

On Mon, Nov 28, 2022, 2:05 PM github-actions[bot] ***@***.***> wrote: This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days. — Reply to this email directly, view it on GitHub <#144 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGGLREBH6QZW7BPR2RXTFETWKRVD5ANCNFSM4KETKXLQ> . You are receiving this because you were assigned.Message ID: ***@***.***>

Answer 26 · 2022-12-13T05:56:29.000Z

Anyone who has recently faced this can they share the workflow? Need to reproduce this to investigate. Logs of workflow run should also work

Answer 27 · 2022-12-13T05:58:22.000Z

I have a similar issue. On my system it seems, that caches are bound to an OS. We have implemented the following pipeline using matrix build parallelization :
linux+java11   -\                        /->  linux+java11
linux+java8      ->  linux+java8  ->    linux+java8
win+java8      -/                        \->   win+java8
win+java11   -/                          \->  win+java11
the jobs on the right should all restore a cache created by the middle job. Anyhow, what's interesting is, that the linux jobs on the right get a cache hit while the windows builds on the right cannot find the cache of the middle job. Is this by intention, even considered to be a different bug, or simply related to this issue?

@maybeec This feels different. You are pointing out that linux and macos caches are compatible and windows are not. Work is going in that area if you like to track: #984

Answer 28 · 2022-12-14T10:07:37.000Z

Anyone who has recently faced this can they share the workflow? Need to reproduce this to investigate. Logs of workflow run should also work

https://github.com/MODFLOW-USGS/modflow6/actions/runs/3624583228/jobs/6111766806#step:6:152

That run uses a fork of actions/cache offering finer-grained control but presumably encounters the same issue

Answer 29 · 2022-12-15T11:57:05.000Z

@w-bonelli Was the cache already present as I don't see a Cache not found warning? If it was already present then this message is correct.

Update: I also see that the workflow is using save only action. And this message is possible with save only action when cache exists with same key

Answer 30 · 2022-12-16T09:02:27.000Z

Closing this as this issue seems resolved with newer versions. I am not able to reproduce this.

Answer 31 · 2022-12-16T10:57:49.000Z

Is it no longer possible for the cache to get stuck in a reserved state after a failed upload?

Would you mind helping me better understand what's going on when there is a Cache not found but then the cache can't save?