Stalled ElasticSearch Upgrade
cehoffman opened this issue · 0 comments
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Upgrade from 6.2.4 to 6.3.0 elasticsearch stalled with last two data and ingest pods unupgraded. The 3 masters upgraded then pods 4, 3, and 2 of the data and ingest upgraded. 1 and 0 did not upgrade and the UpdateVersion loop in navigator controller stopped.
What you expected to happen:
All pods upgraded.
How to reproduce it (as minimally and precisely as possible):
Create a 5 member data and ingest pool and a 3 member master pool at 6.2.4 with 0.1.0 navigator.
Anything else we need to know?:
It appears there was a mixup in pilot updating the version of elasticsearch. See https://gist.github.com/34927d24d0056967aba99c2f5a29ba7e
The d-0 an d-1 pilots indicates they are running the 6.3.0 elasticsearch but they never changed images. Seconds prior to this gist capture (while the upgrade was in the stalled state) pilots d-0 an d-3 indicated they had version 6.2.4. d-0 would be correct, but d-3 was using 6.3.0 image.
It appears there is misalignment in updating or detecting the elasticsearch version of the pilot record.
The events in the describe summary for the cluster are:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal UpdateVersion 40m (x2 over 1h) navigator-controller Updating replica es-logging-master-1 to version 6.3.0
Warning ErrUpdateVersion 37m (x8 over 37m) navigator-controller Pilot "es-logging-master-1" has not finished updating to version "6.3.0"
Normal UpdateVersion 31m (x3 over 1h) navigator-controller Updating replica es-logging-master-2 to version 6.3.0
Normal UpdateVersion 29m navigator-controller Updated node pool "master" to version "6.3.0"
Normal UpdateVersion 24m navigator-controller Updating replica es-logging-d-2 to version 6.3.0
There are a number of failures on the master upgrade because using ES_JAVA_OPTS
doesn't work as an override in the container image with 0.1.0 pilot. I had to change to setting the min/max heap in the jvm.options file.
Environment:
- Kubernetes version (use
kubectl version
): 1.9.6 - Cloud provider or hardware configuration**: Azure
- Install tools: Helm
- Others: