Operator crashes if another elasticsearch cluster exists in the k8s cluster
ferozsalam opened this issue · 1 comments
Under a configuration that I am currently running, there would ideally be two elasticsearch clusters running - one using the UPMC operator, and one not using it. These two clusters would run under different namespaces.
However, when I attempt to run these two clusters in parallel, the operator crashes with:
panic: runtime error: slice bounds out of range
github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processDataPodEvent(0xc0001ac160, 0xc00084a3f8, 0x10b0148, 0x4)
/home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:285 +0x205
github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processPodEvent(0xc0001ac160, 0xc00084a3f8, 0x0, 0x0)
/home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:275 +0x137
github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchDataPodEvents.func1(0xc0003940c0, 0xc0001ac160, 0xc000184120, 0xc000394000, 0xc0003501a0)
/home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:109 +0x1ed
created by github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchDataPodEvents
/home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:105 +0x78
edited after further investigation
After some further digging into the issue, it seems to be caused by the operator attempting to act on pods in the cluster (but outside of the operator's namespace) that are labelled data
or master
. Relabelling the pods in the second ELK cluster causes them to skip both cases in the check (in processor.go
):
case "data":
return p.processDataPodEvent(c)
case "master":
return p.processMasterPodEvent(c)
}
The neatest solution to me would be to only act on pods created by the operator, or in the same namespace as the operator, as I could see these labels being used by a number of different deployments.
This issue is similar to #267