Prometheus can fail to start if it didn't exit cleanly
cryslith opened this issue · 2 comments
See prometheus-junkyard/tsdb#178
We could either pass the --storage.tsdb.no-lockfile
to disable locking entirely (but would lead to data corruption if we somehow ran two prometheus instances at once on the same supervisor node), or we can just resolve to remove the lockfile manually whenever this occurs.
For reference, the lock file to remove is /var/lib/prometheus/data/lock
.
This seems to happen pretty often when I reboot the cluster. Is there any reason why we might not want to disable the lockfile? (If prometheus is entirely managed by systemd, there shouldn't be a case where two instances run at the same time, right?)
I can't think of a reason that we would end up having two instances at the same time, so go for it.