actions/cache

Cached data restored from Rust's Cargo build is corrupted

zonyitoo opened this issue · 19 comments

rust-lang/cargo#8603

It happened only on macOS executor.

It seems that file target/debug/deps/libserde_derive-797b01cb80d42716.dylib restored from cache is corrupted.

@zonyitoo Thanks for reporting this, I'm forking the affected repo and taking a look.

I created a fork and modified the workflow file to create and extract the tar, without involving the cache. From the run here - https://github.com/dhadka/shadowsocks-rust/runs/1009714459?check_suite_focus=true, the checksum before tarring is:

4a439f8844eb8a5bfdc0796c513a22ec3ca5f957  target/debug/deps/libserde-fa40c6e5c43e4135.rlib
4a744bc191cab742bf98d1c47dcea8a28dd95ebb  target/debug/deps/libserde-fa40c6e5c43e4135.rmeta
0efbbaaafce7e28b22688211be95c9844b09d817  target/debug/deps/libserde_derive-38c0c362558a2b3b.dylib
shasum: target/debug/deps/libserde_derive-38c0c362558a2b3b.dylib.dSYM: Is a directory
f669512de5ba0f0ec6de842875dd06e5a3053a51  target/debug/deps/libserde_json-3b48c24f7701f3e5.rlib
1f88ab7e34cf9e3de44962194800949f22c1cc90  target/debug/deps/libserde_json-3b48c24f7701f3e5.rmeta
0428af240e95a1d33810f66974ab8b0fb75bd941  target/debug/deps/libserde_urlencoded-afceb2078e8a3a5d.rlib
37911f8ee8ffb414c970987aaf6da067764bf584  target/debug/deps/libserde_urlencoded-afceb2078e8a3a5d.rmeta

and after extracting is:

4a439f8844eb8a5bfdc0796c513a22ec3ca5f957  target/debug/deps/libserde-fa40c6e5c43e4135.rlib
4a744bc191cab742bf98d1c47dcea8a28dd95ebb  target/debug/deps/libserde-fa40c6e5c43e4135.rmeta
92ffcb5248d3161bc52d10a05321a0a00b61be9d  target/debug/deps/libserde_derive-38c0c362558a2b3b.dylib
shasum: target/debug/deps/libserde_derive-38c0c362558a2b3b.dylib.dSYM: Is a directory
f669512de5ba0f0ec6de842875dd06e5a3053a51  target/debug/deps/libserde_json-3b48c24f7701f3e5.rlib
1f88ab7e34cf9e3de44962194800949f22c1cc90  target/debug/deps/libserde_json-3b48c24f7701f3e5.rmeta
0428af240e95a1d33810f66974ab8b0fb75bd941  target/debug/deps/libserde_urlencoded-afceb2078e8a3a5d.rlib
37911f8ee8ffb414c970987aaf6da067764bf584  target/debug/deps/libserde_urlencoded-afceb2078e8a3a5d.rmeta

The checksums are identical for all except for target/debug/deps/libserde_derive-38c0c362558a2b3b.dylib.

This one is weird. There's actually a number of files with changing checksums:

> diff before.txt after.txt
2020-08-21T00:07:33.6603450Z 93c93
2020-08-21T00:07:33.6604300Z < 0a8d035d746880a35f6458a2d56e91402b6844c6  target/debug/deps/libasync_trait-46034945c551b66f.dylib
2020-08-21T00:07:33.6604970Z ---
2020-08-21T00:07:33.6605470Z > 9b4fd8d4bbd184fe52fdf441aa0a8cf0701b923f  target/debug/deps/libasync_trait-46034945c551b66f.dylib
2020-08-21T00:07:33.6606200Z 281c281
2020-08-21T00:07:33.6606850Z < 85223cd91ca4cb052a467da7bc9842b472ec0197  target/debug/deps/libpest_derive-3dd8ce8471a1df77.dylib
2020-08-21T00:07:33.6607390Z ---
2020-08-21T00:07:33.6607930Z > 393880d51427615390064cef6e570cd86e9fcc64  target/debug/deps/libpest_derive-3dd8ce8471a1df77.dylib
2020-08-21T00:07:33.6608100Z 288c288
2020-08-21T00:07:33.6608790Z < 472ef6dc8aa5beb910c536a9177ffab609245b51  target/debug/deps/libpin_project_internal-6aa6638b8b3f52d8.dylib
2020-08-21T00:07:33.6609310Z ---
2020-08-21T00:07:33.6609830Z > df762c9c4884535727d29f7c8ab00555cb60f44e  target/debug/deps/libpin_project_internal-6aa6638b8b3f52d8.dylib
2020-08-21T00:07:33.6610020Z 342c342
2020-08-21T00:07:33.6610550Z < 0efbbaaafce7e28b22688211be95c9844b09d817  target/debug/deps/libserde_derive-38c0c362558a2b3b.dylib
2020-08-21T00:07:33.6622910Z ---
2020-08-21T00:07:33.6623650Z > 92ffcb5248d3161bc52d10a05321a0a00b61be9d  target/debug/deps/libserde_derive-38c0c362558a2b3b.dylib
2020-08-21T00:07:33.6623850Z 370c370
2020-08-21T00:07:33.6624390Z < 6abd28e535ec7d4572909f63c39c4cb2c3cc4b88  target/debug/deps/libstrum_macros-9dd7edaa2db48451.dylib
2020-08-21T00:07:33.6624850Z ---
2020-08-21T00:07:33.6625360Z > 61b20180358144b5f69ce3d619a5ee6c71e8568e  target/debug/deps/libstrum_macros-9dd7edaa2db48451.dylib
2020-08-21T00:07:33.6625520Z 383c383
2020-08-21T00:07:33.6626050Z < a088fe97eff6d6755601af4ba0eef3f12c5bb85a  target/debug/deps/libthiserror_impl-dd9fa074191b370f.dylib
2020-08-21T00:07:33.6626510Z ---
2020-08-21T00:07:33.6627010Z > 10c8b5ffee0103d38a6071c73b96f9f778b1a6db  target/debug/deps/libthiserror_impl-dd9fa074191b370f.dylib
2020-08-21T00:07:33.6627170Z 421c421
2020-08-21T00:07:33.6627690Z < 491d9943964852a31507a0ef77526819f8b8abdf  target/debug/deps/libtracing_attributes-cd34f14b409bc4fa.dylib
2020-08-21T00:07:33.6628140Z ---
2020-08-21T00:07:33.6628660Z > 6c5364ac6e8e5b0e7579efc212116ade231fe321  target/debug/deps/libtracing_attributes-cd34f14b409bc4fa.dylib
2020-08-21T00:07:33.6628820Z 532c532
2020-08-21T00:07:33.6629340Z < 1f53cc3f6bcffab9558b2dc6fd99914e270b50f2  target/debug/deps/socks5-fff3030d83fcc324
2020-08-21T00:07:33.6629790Z ---
2020-08-21T00:07:33.6630260Z > 4ddf8a4b7a562390d2430ab44e40492ead1e319c  target/debug/deps/socks5-fff3030d83fcc324
2020-08-21T00:07:33.6630430Z 535c535
2020-08-21T00:07:33.6630570Z < e620cc67daeb9e84c04d242c92ac2306d6418919  target/debug/deps/sslocal
2020-08-21T00:07:33.6631000Z ---
2020-08-21T00:07:33.6631160Z > 426172e6a87359bc17b9df23d0f32d0c8fc07190  target/debug/deps/sslocal
2020-08-21T00:07:33.6631290Z 539c539
2020-08-21T00:07:33.6631430Z < 174d7e9e549d19e408103f6d1bd20bcc1668ca5f  target/debug/deps/ssmanager
2020-08-21T00:07:33.6631820Z ---
2020-08-21T00:07:33.6631980Z > 84653919466eefe24eff3e8b420a2f3f0b865716  target/debug/deps/ssmanager
2020-08-21T00:07:33.6632110Z 543c543
2020-08-21T00:07:33.6632250Z < 8789f5059660c34e2030394ad51cf3152b99e8f7  target/debug/deps/ssserver
2020-08-21T00:07:33.6632680Z ---
2020-08-21T00:07:33.6632840Z > add01b89907ecec12d8511e639a1fdacf02778b9  target/debug/deps/ssserver
2020-08-21T00:07:33.6632970Z 547c547
2020-08-21T00:07:33.6633100Z < 3c5325fa5b00068f6a47129aee2c3b2d02ea7afc  target/debug/deps/ssurl
2020-08-21T00:07:33.6633620Z ---
2020-08-21T00:07:33.6633800Z > 30030fee9da2e4d44b08bccf0ea19a1048486a46  target/debug/deps/ssurl
2020-08-21T00:07:33.6633930Z 586c586
2020-08-21T00:07:33.6634510Z < 38e07e448d36c5021d7341c6169ecc51c1d0b2e7  target/debug/deps/tunnel-c4a3e569019c4d4a
2020-08-21T00:07:33.6634980Z ---
2020-08-21T00:07:33.6635830Z > 7f72af6414c176f2a4cd2e8dd0b25041d47541cc  target/debug/deps/tunnel-c4a3e569019c4d4a
2020-08-21T00:07:33.6636150Z 590c590
2020-08-21T00:07:33.6636990Z < 011711872a3df1655ea9c974b84458bfc8a77005  target/debug/deps/udp-9efad90e80ae91ef
2020-08-21T00:07:33.6637410Z ---
2020-08-21T00:07:33.6637920Z > eb01dcd8b640100e2a1ba4d7192cfd5160564ca9  target/debug/deps/udp-9efad90e80ae91ef

You can see these results here - https://github.com/dhadka/shadowsocks-rust/runs/1010226166?check_suite_focus=true

What makes it really weird...if I add sleep 10 after cargo test and before creating the tar, it works fine. Change, Run Output

I'm stumped. Depending on what sequence of commands I run before creating the tar, the problem seems to magically disappear. I was also thinking that perhaps the Actions runner or Cargo was causing the files to be modified after the build/test step ends, so I looked at the last modified time as well as open file handles (lsof) and don't see anything suggesting the files are being modified during or after creation of the tar.

@zonyitoo Since all the files impacted seem to be in target, I'm wondering if you can exclude that from the cache. Do you still see a speedup benefit?

dae commented

target contains all the Rust build products - without it compilation would have to start from scratch

Our repo seems to be hitting the same issue - it's been cropping up sporadically for the last couple of months, and we've been having to bust the cache each time: https://github.com/ankitects/anki/runs/1010643070

@zonyitoo Forcing the Mac runner to use GNU tar instead of BSD tar seems to fix the issue. Please consider using this as a workaround until we can devise a more permanent solution. It adds about 10-15 seconds to install GNU tar, but the savings from skipping the build greatly outweighs this.

Code change: https://github.com/dhadka/shadowsocks-rust/pull/1
Test run: https://github.com/dhadka/shadowsocks-rust/pull/1/checks

CC @dae

dae commented

Thanks, will give that a go and let you know if it comes back.

(Ignore the failure below, I used the wrong package name)

Thanks!

I believe Pipeline Caching uses GNU tar because of various discrepancies FWIW

Hello @johnterickson , @dhadka ,
We got request to bake gnu-tar into MacOS images: actions/runner-images#1534
If it is done, do you have plans to modify actions/cache for MacOS to use gnu-tar by default?

@maxim-lobanov Yes, we have such a plan.

This issue is hopefully coming to an end soon, GNU tar has been added to the macOS image building process on Dec 15th and the 20210123.2 image for macOS 10.15 finally reached 100% rollout progress. I also made a PR to the @actions/cache npm package (that this action uses) to use gtar on macOS when available.

I don't have experience with self-hosted runners but I assume you would also need to install GNU tar on it.

You can already opt to use gtar instead of tar for caching if you just modify your path as suggested in the homebrew formula: https://github.com/Homebrew/homebrew-core/blob/02d54ba4ed23f63e8eeb46de89059d89d9cd6098/Formula/gnu-tar.rb#L66

(This is because cache selects what path to use by an equivalent of which command tar)

anp commented

I also made a PR to the @actions/cache npm package (that this action uses) to use gtar on macOS when available.

Are you referring to actions/toolkit#701? Is this issue resolved now that your PR has been merged and included in a release?

I also made a PR to the @actions/cache npm package (that this action uses) to use gtar on macOS when available.

Are you referring to actions/toolkit#701? Is this issue resolved now that your PR has been merged and included in a release?

The npm package has been released, but this action (and any other action that uses that package internally) has to start using the new version too.
This should be updated to 1.0.6:

"@actions/cache": "^1.0.5",

Then npm install will update the lockfile and npm run build will update the code packaged for distribution then it has to be released/tagged.

I assume this repo is managed by the same team and we just have to be patient. Can you confirm or deny @dhadka ?

If you need this to work now, this is the easiest solution (gnu-tar is already installed on GitHub hosted runners, just need adding it to path):

You can already opt to use gtar instead of tar for caching if you just modify your path as suggested in the homebrew formula: https://github.com/Homebrew/homebrew-core/blob/02d54ba4ed23f63e8eeb46de89059d89d9cd6098/Formula/gnu-tar.rb#L66

Also if there were any invalid caches and your workflow can't survive starting with invalid cache, you have to change the cache key. (Or wait 1 week without loading the cache, so it's evicted)

@Cyberbeni PR to update version - #525

This seems to have been incidentally fixed in cache@v2 which seems to always use gtar.

This issue is stale because it has been open for 200 days with no activity. Leave a comment to avoid closing this issue in 5 days.

This issue was closed because it has been inactive for 5 days since being marked as stale.