sipb/homeworld

Prometheus can fail to start if it didn't exit cleanly

cryslith opened this issue · 2 comments

See prometheus-junkyard/tsdb#178

We could either pass the --storage.tsdb.no-lockfile to disable locking entirely (but would lead to data corruption if we somehow ran two prometheus instances at once on the same supervisor node), or we can just resolve to remove the lockfile manually whenever this occurs.

For reference, the lock file to remove is /var/lib/prometheus/data/lock.

This seems to happen pretty often when I reboot the cluster. Is there any reason why we might not want to disable the lockfile? (If prometheus is entirely managed by systemd, there shouldn't be a case where two instances run at the same time, right?)

I can't think of a reason that we would end up having two instances at the same time, so go for it.