funbox/smppex

ETS ** Too many db tables ** error

long-tran opened this issue · 9 comments

Hi man, I've recently run into this problem on my production environment:

CRASH REPORT==== 7-Nov-2016::09:00:05 ===
  crasher:
    initial call: ranch_conns_sup:init/7
    pid: <0.8347.1>
    registered_name: []
    exception exit: {system_limit,
                        [{ets,new,[pdu_storage_by_sequence_number,[set]],[]},
                         {'Elixir.SMPPEX.PduStorage',init,1,
                             [{file,"lib/smppex/pdu_storage.ex"},{line,43}]},
                         {gen_server,init_it,6,
                             [{file,"gen_server.erl"},{line,328}]},
                         {proc_lib,init_p_do_apply,3,
                             [{file,"proc_lib.erl"},{line,247}]}]}
      in function  ranch_conns_sup:terminate/3 (src/ranch_conns_sup.erl, line 224)
    ancestors: [<0.8346.1>,<0.8345.1>]
    messages: []
    links: []
    dictionary: [{<0.8348.1>,true}]
    trap_exit: true
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 261
  neighbours: 
.....
[error] * Too many db tables

It seems like something to do with the pdu_storage, is there any potential misconfiguration in the the SMPPEX code?

Thanks,
Long

Hello!

Thanks for the feedback.

There are two main reasons that may cause the problem:

  • there is something that creates many ets'es in your code, so that the creation of the next mc session fails when system limits are exhausted;
  • all of the ets'es are consumed by SMPPEX itself, in this case there should be many (unstopped for some reason) mc sessions.

So there are several questions I would like to ask to get the situation more clear:

  • What is the number of simultaneous client connections that your server has when the crash occurs? Have you specified custom max_connections transport option when starting MC?
  • What are the names of ets'es that pollute the ets space when the crash occurs? (This info can be obtained by running :ets.i()).

Closing due to no reply.

@savonarola Hi, we just ran into the same issue. My max_connections is at 600 (while the ETS table limit should be around 1400 by default) and I had a health checker try and open (and close) a socket every 10 seconds.

12:47:58.117 [info]  mc_conn #PID<0.1832.0>, socket closed, stopping

12:48:08.117 [info]  mc_conn #PID<0.1838.0>, socket closed, stopping

12:48:08.117 [info]  mc_conn #PID<0.1841.0>, socket closed, stopping

12:48:18.117 [info]  mc_conn #PID<0.1844.0>, socket closed, stopping

12:48:18.117 [info]  mc_conn #PID<0.1847.0>, socket closed, stopping

12:48:28.117 [info]  mc_conn #PID<0.1850.0>, socket closed, stopping

12:48:28.117 [info]  mc_conn #PID<0.1853.0>, socket closed, stopping

12:48:38.117 [info]  mc_conn #PID<0.1856.0>, socket closed, stopping

...

After a few hours, of this though, any time the health checker opens a socket, we encounter this issue:

16:54:48.121 [error] Ranch listener #Reference<0.0.2.571> connection process start failure; SMPPEX.Session:start_link/4 returned: {:error, {{:badmatch, {:error, {:system_limit, [{:ets, :new, [:pdu_storage_by_sequence_number, [:set]], []}, {SMPPEX.PduStorage, :init, 1, [file: 'lib/smppex/pdu_storage.ex', line: 43]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 328]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}}, [{SMPPEX.MC, :init, 1, [file: 'lib/smppex/mc.ex', line: 386]}, {:gen_server, :init_it, 6, [file: 'gen_server.erl', line: 328]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 247]}]}}

So it seems that the ETS table is not getting cleaned up properly when a Ranch socket is closed.

Do note that we have no active connections to the instance, except for the health-check opening and closing the socket (so this is not a case of it being over-saturated with traffic).

Hello!

Trying to reproduce the issue.

Hello!

I have reproduced the issue; the reason was that peer closing socket is not considered to be an abnormal case, so MC session stopped with :normal leaving child PduStorage alive and keeping its ets.

I have added the necessary cleanup.

@savonarola as always, thank you for the swift fix! 🍻