erlangbureau/jamdb_oracle

Connection pool is exhausted, disconnects hang forever.

Opened this issue · 17 comments

We are using 0.5.0 of the library, via Ecto.

We have an application that makes a query at the beginning of an HTTP request.

For whatever reason, making a number of requests in quick succession (not how our users are doing this, but it was how I could repro an error state) can cause connections in the connection pool to get stuck in the disconnection process.

I think this is because of https://github.com/erlangbureau/jamdb_oracle/blob/master/src/jamdb_oracle.erl#L25

I would like to lobby that we change the call timeout to be either configurable or not infinity. If the process for disconnection takes too long, I would prefer to "let it crash". My application will reconnect, and the Oracle DBAs downstream can handle a couple of ungraceful shutdowns, I'd argue.

I can make this PR, but if I do, it would likely be that I'd set the timeout to be 5000 milliseconds.

Hi,

You can specify timeout in config:

config :friends, Friends.Repo,
database: "friends",
username: "user",
password: "pass",
hostname: "localhost",
timeout: 500

Value will be passed on to connect fun:
https://github.com/erlangbureau/jamdb_oracle/blob/master/lib/jamdb_oracle.ex#L70
https://github.com/erlangbureau/jamdb_oracle/blob/master/src/jamdb_oracle_conn.erl#L33

Try to find problem line by adding erlang:display to disconnect fun in jamdb_oracle_conn.erl

-spec disconnect(state()) -> {ok, [env()]}.
disconnect(State) ->
   erlang:display(100),
    disconnect(State, 1).

-spec disconnect(state(), timeout()) -> {ok, [env()]}.
disconnect(#oraclient{socket=Socket, env=Env, passwd=Passwd}, 0) ->
   erlang:display(101),
    sock_close(Socket),
   erlang:display(102),
    exit(Passwd, ok),
   erlang:display(103),
    {ok, Env};
disconnect(#oraclient{conn_state=connected, socket=Socket, env=Env, passwd=Passwd} = State, _Tout) ->
   erlang:display(104),
    _ = send_req(close, State),
   erlang:display(105),
    sock_close(Socket),
   erlang:display(106),
    exit(Passwd, ok),
   erlang:display(107),
    {ok, Env};
disconnect(#oraclient{env=Env}, _Tout) ->
   erlang:display(108),
    {ok, Env}.

Infinite timeouts are a bug in Erlang. You should not use them in your library when you have user provided timeouts already passed in.

In our case Timeout is mostly used in gen_tcp and has less effect in gen_server
https://www.erlang.org/doc/man/gen_tcp.html#recv-3
https://www.erlang.org/doc/man/gen_server.html#call-2

In gen_server:call default timeout is 5000
Here timeout will be 5000

- call_infinity(Pid, stop).
+ gen_server:call(Pid, stop).

Btw, you can change timeout anytime calling internal command TIMEOUT
https://github.com/erlangbureau/jamdb_oracle/blob/master/src/jamdb_oracle_conn.erl#L124

Ecto.Adapters.SQL.query(YourApp.Repo, "timeout", [5000])

I’m trying to tell you that your library isn’t using it right. I have connection pool processes that are hanging in the disconnection function because of your timeout on infinity

I know this is the case because the only change I made to fix this is the PR I’ve submitted

Is this enough?

- call_infinity(Pid, stop).
+ gen_server:call(Pid, stop).

You need to be able to pass in the user provided timeout, as well.

Is timeout in config not working properly?

config :friends, Friends.Repo,
database: "friends",
username: "user",
password: "pass",
hostname: "localhost",
timeout: 500

Please try stage branch
commit

Why make all of these commits when I have a PR already open for this?

I'd like to close the issue with disconnect.
Can you give me the working example?
Disconnect on L82 doesn't work for me.
https://github.com/erlangbureau/jamdb_oracle/blob/master/lib/jamdb_oracle.ex#L82

MyRepo.disconnect_all(1000)
DBConnection.disconnect_all(pid, 1000)
L82 is never reached.

Jamdb.Oracle.disconnect(nil, pid)
** (FunctionClauseError) no function clause matching in DBConnection.ConnectionPool.handle_call/3

Modified function is not working

  def disconnect(_err, s) do
    query(s, 'CLOSE')
    :ok 
  end

This is working
Ecto.Adapters.SQL.query(YourApp.Repo, "close", [])

My PR is to close the issue

I used my actual working code to verify the change

@vstavskyi Your example about the disconnect/2 function doesn't make sense to me. That code doesn't exist in master or in my PR.

Your changes to propagate the connect timeout still doesn't respect the individual user's timeout parameter. I have updated my PR so that it can merge with master.