[ASoC 2022] Metrics visualization and health scoring model for job
Opened this issue · 0 comments
hoaresky commented
Background
For now, KubeDL dashboard supports displaying basic informations such as jobs, logs and events, and users are able to manipulate objects through some build-in buttons. However, dashboard can help users digging more insights with visualization of core metrics such as resources utilization, I/O tracing. Usually, system metrics will be collected and gathered in Prometheus protocol, which is a good entry point.
Goals to be achieved
- Implement data/metrics visualization leveraging prometheus.
- Based on the job information and data metrics, design a job health model to quantify degree of job runtime healthiness.
Additional context
This issue is part of our #249.
Difficulty: Normal
Mentor: Xuelin Hong (@hoaresky )