ig job stays in Completed state for a long time
f0rmiga opened this issue · 1 comments
f0rmiga commented
I experienced an ig
job that got into the Completed
state and stayed there for a few minutes. Eventually, the controllers picked its output and moved the cluster state forward. While it was in this Completed
state, the KubeCF cluster was broken, with a few pods delete. E.g. the following is the pod list in an HA deployment. Notice the missing api, uaa, diego-cell and router replicas.
NAME READY STATUS RESTARTS AGE
api-0 15/15 Running 5 24m
auctioneer-0 4/4 Running 1 28m
bosh-dns-755d6b884b-cwqgw 1/1 Running 0 13m
bosh-dns-755d6b884b-h92mh 1/1 Running 0 13m
cc-worker-0 4/4 Running 2 27m
cf-apps-dns-564fc5cf4d-jzbcv 1/1 Running 0 14m
cf-apps-dns-564fc5cf4d-qnw46 1/1 Running 0 14m
credhub-0 6/6 Running 0 27m
credhub-1 6/6 Running 0 29m
database-0 2/2 Running 0 13m
database-seeder-8f24862205dd7db3-46p5n 0/2 Completed 0 118m
diego-api-0 6/6 Running 2 28m
diego-cell-0 7/7 Running 2 22m
diego-cell-1 7/7 Running 1 25m
doppler-0 4/4 Running 0 27m
doppler-1 4/4 Running 0 27m
doppler-2 4/4 Running 0 28m
ig-a01395ca9859fa55-rv65v 0/22 Completed 0 13m
log-api-0 7/7 Running 0 27m
log-cache-0 8/8 Running 0 28m
nats-0 4/4 Running 0 28m
nats-1 4/4 Running 0 28m
router-0 5/5 Running 0 27m
routing-api-0 4/4 Running 2 27m
scheduler-0 10/10 Running 6 27m
tcp-router-0 5/5 Running 0 28m
uaa-0 7/7 Running 0 25m
The following is a dump of the cf-operator and quarks-job controllers:
cf-gitbot commented
We have created an issue in Pivotal Tracker to manage this:
https://www.pivotaltracker.com/story/show/175255805
The labels on this github issue will be updated when the story is started.