GoogleCloudPlatform/bank-of-anthos

Cloud Build trigger for ledger broken

bourgeoisor opened this issue · 3 comments

The Cloud Build trigger for the ledger seems stuck at the following:

Step #1 - "build-and-push-images": Checking cache...
Step #1 - "build-and-push-images":  - e2e-tests: Not found. Building
Step #1 - "build-and-push-images":  - balancereader: Error checking cache.
Step #1 - "build-and-push-images": getting hash for artifact "balancereader": getting dependencies for "balancereader": could not fetch dependencies for workspace /workspace: initial Jib dependency refresh failed: failed to get Jib dependencies: running [/workspace/mvnw jib:_skaffold-fail-if-jib-out-of-date -Djib.requiredVersion=1.4.0 --projects src/ledger/balancereader --also-make jib:_skaffold-files-v2 --quiet --batch-mode]

It sounds like there is existing cache conflicting.

I have tried:

  • Setting the $_CACHE env to something else (as a cache-buster)
  • Replacing the $_CACHE variable by a hard-coded cache-buster
  • Looking to see if there were a cache hiding in $HOME/.skaffold
  • Running skaffold build multiple times in a row

None of those methods seem to have any effect. (see various tests in this branch)

The same skaffold build command works fine on my machine:

Generating tags...
 - e2e-tests -> gcr.io/obourgeois-sandbox-10/bank-of-anthos/e2e-tests:v0.5.1-717-g594d53d
 - balancereader -> gcr.io/obourgeois-sandbox-10/bank-of-anthos/balancereader:latest
Checking cache...
 - e2e-tests: Not found. Building
 - balancereader: Not found. Building <--- not found means no errors! yay!
Starting build...
Building [balancereader]...
[. . . and then it actually builds! . . .]

Note that PR pipelines work as expected. What is broken is the individual service pipelines that are triggered one new commits on main. Looking quickly at the list of commits, you can see that those 3 ledger pipelines has never passed since they were implemented, so this is not a regression; they just never worked. The other services pipelines (frontend, contacts, userservice) on the other hand work fine!

@aablsk any potential insight is deeply appreciated!

I think I figured out the culprit. Sounds like there's been a regression for Skaffold v2.2.0+. I bumped back down to v2.1.0 and the build works fine:
image

Closing as fixed! Will check with the Skaffold team.

aablsk commented

Dear @bourgeoisor, sorry for the late reply, I'm currently pretty swamped with onboarding and getting my new apartment set up.
I think this problem was related to this issue (which includes the workaround I was using before 2.1.0): GoogleContainerTools/skaffold#7409

I think it would be valuable for the skaffold team to be aware of this potential regression. (I recommend letting @aaron-prindle know)