OOM in containerized dfget in Kubernetes
Opened this issue · 0 comments
Ⅰ. Issue Description
We have seen frequent OOM Kill on the dfget process when it is containerized and orchestrated by Kubernetes,
Ⅱ. Describe what happened
When dfget is executed within a Kubernetes pod, Prometheus reports a high memory usage in its container_memory_working_set_bytes
metrics. From what we have seen, the reported metrics can easily exceed hundres of megabytes, and it is not common to it creeping into gigabytes
This metrics is reported by cAdvisor, and its total memory usage - inactive files. See here for its definition.
container_memory_working_set_bytes
excludes the cached data and it is what OOM killer uses for calculating oom_score
.
Ⅲ. Describe what you expected to happen
I expect a stable memory usage (as reported by container_memory_working_set_bytes
metrics) while a file is been downloaded. I did not observe a significant memory spike when downloading the same file using wget
.
Ⅳ. How to reproduce it (as minimally and precisely as possible)
- Create a Kubernetes Deployment with
dragonflyoss/dfclient:1.0.6
image - Execute the dfget process within the pod
Ⅴ. Anything else we need to know?
Ⅵ. Environment:
- dragonfly version: v1.0.6
- Host OS (e.g. from /etc/os-release): CentOS Linux 8
- Kubernetes Version: v1.21.1
- Install tools:
- Others: