oracle/weblogic-kubernetes-operator

fluentd config map weblogic-fluentd-configmap is not domain scoped

belfo opened this issue · 8 comments

belfo commented

Hello,

i noticed that when the operator create the fluentd config map with the configuration it's name is generic: weblogic-fluentd-configmap

When we have more than 1 domain, this cause an issue, as the different domain continously delete and recreate the config map.
At one moment i had a domain Pod failing to start as the config map was not present.
Note: I don't use this config map, i create a different one with my specific config, and reference it in the startup of fluentd, but by default this map is required to start the pod.

As it's defined inside the domain, it should, as other config map have the domain name in the config map name, like
$(DOMAIN_UID)-weblogic-fluentd-configmap. Or at least if already exist don't delete and recreate? to avoid the POD to fail to start.

Regard

Which version of the operator are you using?

Thank you. The problem is being investigated and tracked using internal JIRA OWLS-106046. As you indicated, it looks like the problem is likely due to an Operator bug.

@rjeberhard FYI

belfo commented

I noticed on v3.4.4 but i see the same on version 4.0.4

"timestamp":"2023-01-20T09:56:41.452287162Z","thread":39,"fiber":"engine-operator-thread-4-fiber-5 NOT_COMPLETE","namespace":"admin-ns","domainUID":"admin-int","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1674208601452,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""}
{"timestamp":"2023-01-20T09:56:41.743719944Z","thread":43,"fiber":"engine-operator-thread-6-fiber-3 NOT_COMPLETE","namespace":"admin-ns","domainUID":"pdp-host","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1674208601743,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""}
{"timestamp":"2023-01-20T09:56:41.85966846Z","thread":49,"fiber":"engine-operator-thread-10-fiber-4 NOT_COMPLETE","namespace":"admin-ns","domainUID":"ecas-sync","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1674208601859,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""}
{"timestamp":"2023-01-20T09:56:41.90746743Z","thread":22,"fiber":"engine-operator-thread-1-fiber-6 NOT_COMPLETE","namespace":"admin-ns","domainUID":"ecas-host","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1674208601907,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""}

and so on

@belfo, I've published operator 3.4.5 with the fix. Thank you very much for reporting this issue!

Note: as is consistent with how we normally handle bugs like this, you should not see the pods for your WebLogic instances roll (be replaced) immediately after updating the operator. Instead, whenever new pods are created then the new pods will have the corrected ConfigMap name. This is done so that other customers who haven't hit this bug (e.g. because they only have one domain in a namespace) won't see a surprise roll when they upgrade the operator.

belfo commented

hello @rjeberhard

i moved to operator 4.0.5 and recreated all domains.
i still see the operator logging:

{"timestamp":"2023-03-17T15:31:48.188163587Z","thread":47,"fiber":"engine-operator-thread-10-fiber-21639 NOT_COMPLETE","namespace":"uumdsdev","domainUID":"ecas-sync","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1679067108188,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""} {"timestamp":"2023-03-17T15:31:48.189467745Z","thread":36,"fiber":"engine-operator-thread-3-fiber-21638 NOT_COMPLETE","namespace":"uumdsdev","domainUID":"sample-cs","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1679067108189,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""} {"timestamp":"2023-03-17T15:31:48.225187312Z","thread":22,"fiber":"engine-operator-thread-1-fiber-21640 NOT_COMPLETE","namespace":"uumdsdev","domainUID":"admin-ext","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1679067108225,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""} {"timestamp":"2023-03-17T15:31:48.226359355Z","thread":38,"fiber":"engine-operator-thread-4-fiber-21643 NOT_COMPLETE","namespace":"uumdsdev","domainUID":"ecas-host","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1679067108226,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""} {"timestamp":"2023-03-17T15:31:48.233458232Z","thread":46,"fiber":"engine-operator-thread-9-fiber-21641 NOT_COMPLETE","namespace":"uumdsdev","domainUID":"admin-int","level":"INFO","class":"oracle.kubernetes.operator.helpers.ConfigMapHelper$ReplaceFluentdConfigMapResponseStep","method":"onSuccess","timeInMillis":1679067108233,"message":"Fluentd configmap replaced.","exception":"","code":"","headers":{},"body":""}

something still not correct
the' configmap name is ok
example : admin-ext-weblogic-fluentd-configmap

I'm not understanding the issue with the sample name. We added the "domain UID" value as a prefix to the previous FluentD config map name, which should make it unique in the namespace.

belfo commented

indeed, the prefix is there, admin-ext is the domain name.
But still i got this log in the operator telling that is replacing the config map.
So the operator think he need to replace the config map. that has an unique name.

except if it's related to the admin server + managed server? has each one have a fluentd ?

belfo commented

@rjeberhard
do i create a separate ticket?
it dosen't prevent containers to start, but it's plluting the logs