A better "valid use case" section for Pushgateway
iNishant opened this issue · 2 comments
Currently, the docs here mention https://prometheus.io/docs/practices/pushing/#should-i-be-using-the-pushgateway
Usually, the only valid use case for the Pushgateway is for capturing the outcome of a service-level batch job. A "service-level" batch job is one which is not semantically related to a specific machine or job instance (for example, a batch job that deletes a number of users for an entire service). Such a job's metrics should not include a machine or instance label to decouple the lifecycle of specific machines or instances from the pushed metrics."
Its hard (at least personally and maybe for others) to infer a common use case of Pushgateway from the above paragraph which is to push metrics from a machine, as the metrics change, because the machine itself will not be available for scraping after the job, because its configured to be deleted after the job completes. Also, the example job a batch job that deletes a number of users for an entire service
doesn't feel like the best example because its easily possible to implement this job in a way that it runs on a machine, which is not deleted after the job and where the metrics can be scraped normally (all machine/instance labels can be ignored).
For eg an alternate "valid use case" could be
Imagine you have a ML model training job for which your system spawns a container to run the job. Your system is configured to delete the container after the job completes (to save on cost/resources). Now imagine prometheus scraping this container, its possible some metrics or their latest values are not scraped because the container itself disappears after the job completes.
Do folks feel the same?