oracle/weblogic-kubernetes-operator

livelinessProbe.sh not returning error despite weblogic server process not found

jkramplify opened this issue · 3 comments

Hi,

Weblogic Kubernetes Operator Version: 3.2.3

We have noticed that the weblogic-server container in one of our pods was not running. We did some checking and found out weblogic-server process was terminated because memory issue.

<Dec 13, 2022 4:06:29 PM HKT> <SEVERE> <domain> <m3> <Unexpected error while monitoring server>
java.io.IOException: Cannot allocate memory

We expected the container to restart because of this but it didn't. We did some further checks and the livelinessprobe is not returning any error that's why the container was not restarted.

[oracle@domain-m3 scripts]$ bash livenessProbe.sh
[oracle@domain-m3 scripts]$ $?
bash: 0: command not found
[oracle@domain-m3 scripts]$

Can someone explain why the liveliness probe is behaving this way?

There is a Node Manager process that monitors the health of the weblogic server. When the weblogic server terminated unexpectedly, the Node Manager should have updated the state file corresponding to the server. The livenessProbe.sh expects the state file to exist ($DOMAIN_HOME/servers//data/nodemanger/.state). It's hard to say exactly why the liveness probe is giving the false result without some more information, such as the contents of the state file and/or any log information for the weblogic server and nodemanager.

@jkramplify The latest WebLogic Kubernetes Operator release is 3.4.4 You may want to see if the latest release contains a fix that resolves your issue. Our support statement indicates that we only support the latest minor release of a major line e.g. 3.4.x.

@lennyphan Thank you for the reply. At this point we will be trying the 4.0.0 in a dev environment.