SumoLogic/sumologic-otel-collector

[k8sprocessor] Handle resource deletion on DeletedFinalStateUnknown

Closed this issue · 1 comments

The k8sprocessor uses a cache to store cluster resources and to update its model on change. In a degraded state, this cache can become disconnected from the apiserver and can miss deletion watch events. The cache will eventually reconcile this and notify the processor of the deletion and the resource's last known state using the tombstone cache.DeletedFinalStateUnknown. The processor is mishandling this notification.

Uncovered in #1267.

It is unclear to me if this particular issue is a primary contributing factor to the memory consumption problems observed in 1267.

FWIW, I can reliably reproduce this error for Pods, but I can't seem to hit the similar case for the (owning) resources. If it is possible, I'm pretty sure it'd cause a panic.

EDIT confirmed. Just needed to be a bit more patient.

panic: interface conversion: interface {} is cache.DeletedFinalStateUnknown, not *v1.Endpoints

goroutine 75 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).genericEndpointOp(0x721d664?, {0x669e460?, 0xc003c3a3e0?}, 0xc002da7100?)
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor@v0.0.0-00010101000000-000000000000/kube/owner.go:417 +0x267
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).deleteEndpoint(0xc0029b6c60, {0x669e460, 0xc003c3a3e0})
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor@v0.0.0-00010101000000-000000000000/kube/owner.go:435 +0x1ca
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).addOwnerInformer.func3({0x669e460?, 0xc003c3a3e0?})
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor@v0.0.0-00010101000000-000000000000/kube/owner.go:320 +0x25
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).addOwnerInformer.(*OwnerCache).deferredDelete.func4.1()
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor@v0.0.0-00010101000000-000000000000/kube/owner.go:298 +0x22
github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.(*OwnerCache).deleteLoop(0xc0029b6c60, 0x0?, 0x0?)
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor@v0.0.0-00010101000000-000000000000/kube/owner.go:514 +0x110
created by github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor/kube.newOwnerProvider in goroutine 1
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/k8sprocessor@v0.0.0-00010101000000-000000000000/kube/owner.go:121 +0x1d3