pulumi/pulumi-kubernetes-operator

Stack lastUpdate state not updated when a transient sync error is encountered

Closed this issue · 3 comments

What happened?

When an already synchronized stack encounters a transient synchronization problem, it will mark the stack as failed. However, after the sync issue is resolved the lastUpdate state is not updated even if the stack is in a known good state.

In our case, we had a synchronized stack deployed to one of our kubernetes clusters. We experienced a brief connectivity issue where the Pulumi operator was unable to reach the stack repository server. When the operator attempted to synchronize the stack, it entered this code path and marked the stack as failed as expected.

https://github.com/pulumi/pulumi-kubernetes-operator/blob/master/pkg/controller/stack/stack_controller.go#L454

After the connectivity issue was resolved, the stack synchronization successfully completed. However, when the stack is already synchronized, it will enter this code path.

https://github.com/pulumi/pulumi-kubernetes-operator/blob/master/pkg/controller/stack/stack_controller.go#L627-L632

That results in the Ready condition being marked as True. Unfortunately, it does not also update the lastUpdate state. The consequence is that the stack will still be marked as failed from a metrics perspective even though it is in a known good (synchronized) state.

Expected Behavior

After transient synchronization problems are resolved, the stack state should return to succeeded.

Steps to reproduce

  1. Create a stack that is managed by the Pulumi operator.
  2. Wait for that stack to synchronize and reach the succeeded state.
  3. Inhibit connectivity to the stack repository (something like blocking traffic to the repository server with iptables).
  4. Wait for that stack to fail synchronization and reach the failed state.
  5. Resolve the connectivity issue.
  6. Wait for stack to synchronize again.

After those steps, the stack will still be marked with a failed state.

Output of pulumi about

N/A

Additional context

No response

Contributing

Ensuring that the lastUpdate state is succeeded when commit IDs match should result in the expected behavior.

#429

@kenji-fukasawa Thanks for reporting this issue, detailed code analysis and a PR. We look forward to working with you to get this merged in.

Hey guys. Just wanted to check in on this one. Looks like the PR is passing the acceptance tests, but wanted to make sure there's not anything else you need from me.

@kenji-fukasawa Sorry for dropping the ball on this, and thanks for submitting a PR. I have approved and merged your PR. Thanks for you contribution!