olcf/olcf-test-harness

Feature Request: Include either test ID or job ID for node health entries

hagertnl opened this issue · 1 comments

Currently, node health entries log machine, node, and test as tags, and status and message as fields to node_health. This makes it pretty difficult to re-associate the result to a specific test or job. I believe the test ID should be added as a field to node_health.

In general, additions to node_health should be carefully considered, as node_health has many more entries than the events or metrics databases.

Thinking further, I think we should use the test ID, not job ID. Reason being that test ID is an indexed "tag" in the events measurement, so this would allow for easy joining. Also, while the harness is currently used heavily in HPC environments, I think we still need to retain support for workloads not using a scheduler, where a test ID would not have a job ID. For example, local workstations.