upmc-enterprises/elasticsearch-operator

Operator crashes if another elasticsearch cluster exists in the k8s cluster

ferozsalam opened this issue · 1 comments

Under a configuration that I am currently running, there would ideally be two elasticsearch clusters running - one using the UPMC operator, and one not using it. These two clusters would run under different namespaces.

However, when I attempt to run these two clusters in parallel, the operator crashes with:

panic: runtime error: slice bounds out of range

github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processDataPodEvent(0xc0001ac160, 0xc00084a3f8, 0x10b0148, 0x4)
        /home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:285 +0x205
github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).processPodEvent(0xc0001ac160, 0xc00084a3f8, 0x0, 0x0)
        /home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:275 +0x137
github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchDataPodEvents.func1(0xc0003940c0, 0xc0001ac160, 0xc000184120, 0xc000394000, 0xc0003501a0)
        /home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:109 +0x1ed
created by github.com/upmc-enterprises/elasticsearch-operator/pkg/processor.(*Processor).WatchDataPodEvents
        /home/steve/godev/src/github.com/upmc-enterprises/elasticsearch-operator/pkg/processor/processor.go:105 +0x78

edited after further investigation

After some further digging into the issue, it seems to be caused by the operator attempting to act on pods in the cluster (but outside of the operator's namespace) that are labelled data or master. Relabelling the pods in the second ELK cluster causes them to skip both cases in the check (in processor.go):

    case "data":
        return p.processDataPodEvent(c)
    case "master":
        return p.processMasterPodEvent(c)
    }   

The neatest solution to me would be to only act on pods created by the operator, or in the same namespace as the operator, as I could see these labels being used by a number of different deployments.

This issue is similar to #267

TBBle commented

Possibly fixed by #303 in 0.4.0.