ETS ** Too many db tables ** error
long-tran opened this issue · 9 comments
Hi man, I've recently run into this problem on my production environment:
CRASH REPORT==== 7-Nov-2016::09:00:05 ===
crasher:
initial call: ranch_conns_sup:init/7
pid: <0.8347.1>
registered_name: []
exception exit: {system_limit,
[{ets,new,[pdu_storage_by_sequence_number,[set]],[]},
{'Elixir.SMPPEX.PduStorage',init,1,
[{file,"lib/smppex/pdu_storage.ex"},{line,43}]},
{gen_server,init_it,6,
[{file,"gen_server.erl"},{line,328}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,247}]}]}
in function ranch_conns_sup:terminate/3 (src/ranch_conns_sup.erl, line 224)
ancestors: [<0.8346.1>,<0.8345.1>]
messages: []
links: []
dictionary: [{<0.8348.1>,true}]
trap_exit: true
status: running
heap_size: 610
stack_size: 27
reductions: 261
neighbours:
.....
[error] * Too many db tables
It seems like something to do with the pdu_storage, is there any potential misconfiguration in the the SMPPEX code?
Thanks,
Long
Hello!
Thanks for the feedback.
There are two main reasons that may cause the problem:
- there is something that creates many
ets
'es in your code, so that the creation of the next mc session fails when system limits are exhausted; - all of the
ets
'es are consumed bySMPPEX
itself, in this case there should be many (unstopped for some reason) mc sessions.
So there are several questions I would like to ask to get the situation more clear:
- What is the number of simultaneous client connections that your server has when the crash occurs? Have you specified custom
max_connections
transport option when starting MC? - What are the names of
ets
'es that pollute theets
space when the crash occurs? (This info can be obtained by running:ets.i()
).
Closing due to no reply.
@savonarola Hi, we just ran into the same issue. My max_connections is at 600 (while the ETS table limit should be around 1400 by default) and I had a health checker try and open (and close) a socket every 10 seconds.
12:47:58.117 [info] mc_conn #PID<0.1832.0>, socket closed, stopping
12:48:08.117 [info] mc_conn #PID<0.1838.0>, socket closed, stopping
12:48:08.117 [info] mc_conn #PID<0.1841.0>, socket closed, stopping
12:48:18.117 [info] mc_conn #PID<0.1844.0>, socket closed, stopping
12:48:18.117 [info] mc_conn #PID<0.1847.0>, socket closed, stopping
12:48:28.117 [info] mc_conn #PID<0.1850.0>, socket closed, stopping
12:48:28.117 [info] mc_conn #PID<0.1853.0>, socket closed, stopping
12:48:38.117 [info] mc_conn #PID<0.1856.0>, socket closed, stopping
...
After a few hours, of this though, any time the health checker opens a socket, we encounter this issue:
16:54:48.121 [error] Ranch listener #Reference<0.0.2.571> connection process start failure; SMPPEX.Session:start_link/4 returned: {:error, {{:badmatch, {:error, {:system_limit, [{:ets, :new, [:pdu_storage_by_sequence_number, [:set]], []}, {SMPPEX.PduStorage, :init, 1, [file: 'lib/smppex/pdu_storage.ex', line: 43]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 328]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}}, [{SMPPEX.MC, :init, 1, [file: 'lib/smppex/mc.ex', line: 386]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 328]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}
So it seems that the ETS table is not getting cleaned up properly when a Ranch socket is closed.
Do note that we have no active connections to the instance, except for the health-check opening and closing the socket (so this is not a case of it being over-saturated with traffic).
Hello!
Trying to reproduce the issue.
Hello!
I have reproduced the issue; the reason was that peer closing socket is not considered to be an abnormal case, so MC session stopped with :normal
leaving child PduStorage
alive and keeping its ets
.
I have added the necessary cleanup.
@savonarola as always, thank you for the swift fix! 🍻