CROSSJACK is a system metrics data collection and visualisation service for SCARF, implementing the jobstats platform. The following repository contains various implementation specific files and documentation, for the full source code and documentation please see the jobstats repository.
The following diagram (V1) shows the node setup with the exports, nodes labeled with prefix cn...
refer to CPU nodes and gn...
GPU nodes. The following shows which exporters refer to their respective service file:
- node_exporter - overall node metrics
- cgroup_exporter - job specific metrics
- nvidia_exporter - GPU metrics for nvdia hardware
Prometheus rpm - https://packagecloud.io/prometheus-rpm/release/packages/el/9/prometheus2-2.53.0-1.el9.x86_64.rpm?distro_version_id=240