Inconsistent state

Question

Inconsistent state

Closed this issue 9 years ago · 2 comments

kopax commented 9 years ago

I would like to excuse, if I don't provide enough informations on this bug.

So far, I wasn't able to find any log for this.

I have started 3 panteras M + S instance, no problem.

I can run a few dockerized application on Marathon.

After a few weeks of usage, the hard drive get more and more full.
Is there any kind of log we need to flush ?

Also, I have tried some deployments and they got stuck in deployment, no containers get started and not a single log in mesos.

I wonder if this is done because a server got disconnected/reconnected from the network and the paas went in a inconsistent state.

I did restart all 3 servers by stopping panteras + erasing panteras container + rm -rf /tmp/mesos/* made the trick but it is not a good solution for long terme. It required to restart all the services at once.

Is there another way to get over this bug ?

Answer 1 · 2016-03-25T08:25:40.000Z

Ad cleanups and proper config (all points are very important):

Make sure that your docker log-driver IS NOT buffering all logs,
but send them to syslog instead.
To do that set up /etc/default/docker and add parameter like:
DOCKER_OPTS="--log-driver=syslog ${DOCKER_OPTS}"
Make sure that mesos do basic cleanup --gc_delay=1days
(you can see mesos-slave with ps command should contain that option)
Make sure you have cronjob on native host: cleanup of docker images, sth like:
A=$(docker images -q -f dangling=true);[ "$A" ] && docker rmi $A
Make sure, that your apps inside containers logs to a volume, binded from native host,
so container volume (aufs) is not growing over a time.
If you have experienced orphaned volumes, you might think about this clean up:
https://github.com/cloudnautique/vol-cleanup

Also, I have tried some deployments and they got stuck in deployment, no containers get started and not a single log in mesos.

This happens when mesos has no more resources (CPU/mem/disk)

I did restart all 3 servers by stopping panteras + erasing panteras container + rm -rf /tmp/mesos/* made the trick but it is not a good solution for long terme. It required to restart all the services at once.

This "hard reset" is only need on total disasters, or upgrades :)
your definitely should survive normal work without that.

Answer 2 · 2016-03-25T11:16:31.000Z

Thanks for all your recommendations. I will try all of them asap.