sandialabs/scot

Redis & Mongo Entering FATAL State

peasead opened this issue · 7 comments

I know that Docker is in Beta, but I'm having an issue with Redis and Mongo.

2016-01-11 04:01:01,800 INFO exited: redis (exit status 0; not expected)
2016-01-11 04:01:01,803 CRIT reaped unknown pid 112)
2016-01-11 04:01:01,857 INFO exited: mongo (exit status 100; not expected)
2016-01-11 04:01:03,861 INFO spawned: 'mongo' with pid 116
2016-01-11 04:01:03,862 INFO spawned: 'redis' with pid 117
2016-01-11 04:01:03,904 INFO exited: redis (exit status 0; not expected)
2016-01-11 04:01:03,907 CRIT reaped unknown pid 118)
2016-01-11 04:01:03,993 INFO exited: mongo (exit status 100; not expected)
2016-01-11 04:01:06,915 INFO spawned: 'redis' with pid 153
2016-01-11 04:01:06,923 INFO exited: redis (exit status 0; not expected)
2016-01-11 04:01:06,924 INFO gave up: redis entered FATAL state, too many start retries too quickly
2016-01-11 04:01:06,925 CRIT reaped unknown pid 154)
2016-01-11 04:01:07,197 INFO spawned: 'mongo' with pid 155
2016-01-11 04:01:07,297 INFO exited: mongo (exit status 100; not expected)
2016-01-11 04:01:07,379 INFO gave up: mongo entered FATAL state, too many start retries too quickly

I looked through the Dockerfile and I see that supervisor is installed, but it appears to be missing from the ubuntu_installer.sh file that the Dockerfile calls. I don't know a lot about Supervisor, but it is supposed to be a process manager, so maybe it could be the culprit?

Hi Pease, supervisor is installed in the docker image here: https://github.com/sandialabs/scot/blob/master/Dockerfile#L18

There might have been changes to the main SCOT install script referenced here:https://github.com/sandialabs/scot/blob/master/Dockerfile#L32

I haven't been following those changes. The new way to design dockerized apps is to use docker-compose and have all services be inside their own containers. Once the new version of SCOT is out I can take a look at implementing that.

The benefit being that then you could upgrade to a new version of SCOT without having to migrate your data out and back into the new container. This might call for a re-design to the way SCOT references it databases etc. I believe @toddbruner is already working on that.

Sorry for the inconvenience.

I am, however, a little confused as to why it isn't working for you be the Demo running in AWS is the docker version of SCOT and it seems to be working fine. I think you might be using a docker env that doesn't have enough resources to start all the services.

Thanks for the response, sandywater.

I don't think that it's a resource issue, but maybe I don't understand how VMWare handles this particular Docker image. See screengrab from Atomic Host:
8 GB RAM
4 CPU
img

When was the last time that docker run sandialabs/scot was used in the AWS instance? I'm just wondering if there have been some changes to redis or mongo (or some other dependencies) between then and now?

Sorry to be a pain. I had another team mate try to do this and he ran into the same issue.

I'll just throw in a "me too". I experienced a very similar problem on a RHEL 6.6 server. Although there may be additional dependency issues at play here that the OP didn't have.

[root@xxx ~]# docker run sandialabs/scot
/usr/lib/python2.7/dist-packages/supervisor/options.py:295: 
UserWarning: Supervisord is running as root and it is searching for
its configuration file in default locations (including its current
working directory); you probably want to specify a "-c" argument
specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2015-05-08 20:13:59,241 CRIT Supervisor running as root (no user in config file)
2015-05-08 20:13:59,241 WARN Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2015-05-08 20:13:59,285 INFO RPC interface 'supervisor' initialized
2015-05-08 20:13:59,285 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2015-05-08 20:13:59,286 INFO supervisord started with pid 1
2015-05-08 20:14:00,289 INFO spawned: 'mongo' with pid 8
2015-05-08 20:14:00,293 INFO spawned: 'redis' with pid 9
2015-05-08 20:14:00,296 INFO spawned: 'apache2' with pid 10
2015-05-08 20:14:00,298 INFO spawned: 'cron' with pid 11
2015-05-08 20:14:00,300 INFO spawned: 'scot3' with pid 12
2015-05-08 20:14:00,303 INFO spawned: 'activemq' with pid 13
2015-05-08 20:14:00,303 INFO exited: mongo (exit status 127; not expected)
2015-05-08 20:14:00,304 INFO exited: redis (exit status 127; not expected)
2015-05-08 20:14:00,312 INFO exited: activemq (exit status 127; not expected)
2015-05-08 20:14:01,401 INFO spawned: 'mongo' with pid 70
2015-05-08 20:14:01,405 INFO spawned: 'redis' with pid 71
2015-05-08 20:14:01,406 INFO success: apache2 entered RUNNING state,
process has stayed up for > than 1 seconds (startsecs)
2015-05-08 20:14:01,406 INFO success: cron entered RUNNING state,
process has stayed up for > than 1 seconds (startsecs)
2015-05-08 20:14:01,406 INFO success: scot3 entered RUNNING state,
process has stayed up for > than 1 seconds (startsecs)
2015-05-08 20:14:01,408 INFO spawned: 'activemq' with pid 72
2015-05-08 20:14:01,410 INFO exited: mongo (exit status 127; not expected)
2015-05-08 20:14:01,412 INFO exited: redis (exit status 127; not expected)
2015-05-08 20:14:01,418 INFO exited: activemq (exit status 127; not expected)
2015-05-08 20:14:03,423 INFO spawned: 'mongo' with pid 101
2015-05-08 20:14:03,426 INFO spawned: 'redis' with pid 102
2015-05-08 20:14:03,429 INFO spawned: 'activemq' with pid 103
2015-05-08 20:14:03,431 INFO exited: mongo (exit status 127; not expected)
2015-05-08 20:14:03,433 INFO exited: redis (exit status 127; not expected)
2015-05-08 20:14:03,440 INFO exited: activemq (exit status 127; not expected)
2015-05-08 20:14:06,683 INFO spawned: 'mongo' with pid 180 
2015-05-08 20:14:06,687 INFO spawned: 'redis' with pid 181
2015-05-08 20:14:06,688 INFO spawned: 'activemq' with pid 182
2015-05-08 20:14:06,692 INFO exited: mongo (exit status 127; not expected)
2015-05-08 20:14:06,697 INFO gave up: mongo entered FATAL state, too many start retries too quickly
2015-05-08 20:14:06,698 INFO exited: redis (exit status 127; not expected)
2015-05-08 20:14:06,698 INFO gave up: redis entered FATAL state, too many start retries too quickly
2015-05-08 20:14:06,699 INFO exited: activemq (exit status 127; not expected)
2015-05-08 20:14:07,700 INFO gave up: activemq entered FATAL state, too many start retries too quickly

Any update here?

As a work around, I've spent the last few days rebuilding the Dockerfile, and the ubuntu_installer.sh doesn't seem to handle Mongo properly (won't initialize after 100 seconds)...so, any help on getting the Dockerfile from this repo working correctly would be greatly appreciated.

I can get on a Hangouts session or something if that would help.

Sorry, we are focused on completing the next release which is an extensive rewrite. I think our docker image is flawed but we are faced with the choice of trying to fix it or concentrating on the next release.

Once the new release is out (hopefully in a month or so) we will see what we can do about docker images. For now, your best bet would be to use ubuntu_install.sh on an ubuntu vm or regular server.

Any Docker experts out there that want to help "Docker-ize" SCOT? We would welcome the help because we are stretched a little thin and can not devote the resources to this task.

Looks like your program name in supervisor config file is 'mongo' change it to 'mongo_daemon' or something else

e.g. in /etc/supervisor/conf.d/mongo.conf
change
[program:mongo]
to
[program:mongo_daemon]