This script is intended to be used with Lustre jobstats and lustre_exporter. The REST requests from prometheus to lustre_exporter will be modified by this script to add user and account labels instead of only $SLURM_JOB_ID
.
Metrics from nodes using procname_uid
will also be modified, the numerical UID will be extracted and converted to a username and stored in the user label. The application name is also extracted and available in the tags.
A MySQL connection to the slurmdb database is used to extract the user and account from the $SLURM_JOB_ID
.
A connection to a ldap server is used to convert numerical uid to a username.
Here is an example of the modified metrics with the additional tags:
lustre_job_read_bytes_total{component="ost",jobid="18526388",target="lustre04-OST0007",fs="lustre04",user="user1",account="an_account"} 0
lustre_job_read_samples_total{component="ost",jobid="chmod.3021723",target="lustre04-OST0004",fs="lustre04",application="chmod",user="user2"} 0
This allow native request in prometheus with the new tags, like doing the sum of all the IOPS from a single user with many jobs with 1 Prometheus request.
Prometheus can now combined the information of multiple jobs and sum them per user. Example to get the IOPS per user:
topk(20, sum by (user) (rate(lustre_job_stats_total{instance=~"lustre-mds.*"}[5m])))
(Negative bandwidth means reading from the filesystem in this graph)
The relabel feature is used to redirect the REST call to the local lustre_exporter_slurm script instead of pooling directly the MDS/OSS.
relabel_configs:
- source_labels: [__address__]
target_label: __metrics_path__
regex: '(.*):(.*)'
replacement: '/$1'
- source_labels: [__address__]
target_label: instance
- source_labels: [__address__]
regex: '(.*):(.*)'
replacement: '127.0.0.1:8080'
target_label: __address__
A manual test can be done before modifing prometheus config:
curl 127.0.0.1:8080/lustre04-oss1
The output of this curl should have the new tags, this is what Prometheus will index.
This script is using the hostname specified at the end of the previous url to launch a HTTP request to the lustre server on port 9169, where lustre_exporter is running.
An example of the expected config is available in config.ini.dist
. For the MySQL user in the slurmdb, this can be a read-only user with access only to the job_table table.