thefactory/cloudformation-mesos

Too many open files in system

Closed this issue · 1 comments

When launching the stack on a c3.xlarge, I get Too many open files in system on the master instance, reliably. When I have managed to get a root shell on the instance quickly enough to run lsof, I see a huge number of listings for anon_inode and pipe:

...
java      1380 1518       root *426u     0000                0,9         0       8324 anon_inode
java      1380 1518       root *427r     FIFO                0,8       0t0    1946863 pipe
java      1380 1518       root *428w     FIFO                0,8       0t0    1946863 pipe
java      1380 1518       root *429r     FIFO                0,8       0t0    2529206 pipe
java      1380 1518       root *430w     FIFO                0,8       0t0    2529206 pipe
java      1380 1518       root *431u     0000                0,9         0       8324 anon_inode
java      1380 1518       root *432r     FIFO                0,8       0t0    2522706 pipe
java      1380 1518       root *433w     FIFO                0,8       0t0    2522706 pipe
java      1380 1518       root *434u     0000                0,9         0       8324 anon_inode
java      1380 1518       root *435r     FIFO                0,8       0t0    2522707 pipe
java      1380 1518       root *436w     FIFO                0,8       0t0    2522707 pipe
java      1380 1518       root *437u     0000                0,9         0       8324 anon_inode
java      1380 1518       root *438r     FIFO                0,8       0t0    2522708 pipe
java      1380 1518       root *439w     FIFO                0,8       0t0    2522708 pipe
java      1380 1518       root *440u     0000                0,9         0       8324 anon_inode
java      1380 1518       root *441r     FIFO                0,8       0t0    2522709 pipe
java      1380 1518       root *442w     FIFO                0,8       0t0    2522709 pipe
java      1380 1518       root *443u     0000                0,9         0       8324 anon_inode
...

I have confirmed that uname -a returns 64000, which should be sufficient. Increasing to 100000 does not improve matters.

The slave instances launch fine with no such issues.

I have recreated the stack multiple times with the same result.

It appears that this was because the instances did not have a public hostname, which the init scripts were expecting.