rsyslog/librelp

potential hang during session reconnect

rgerhards opened this issue · 0 comments

When librelp does a session reconnect and the remote peer aborts quickly, librelp may hang. This can occasionally be seen in the receiver-abort.sh test. Relevant part from the test log with comments is:

send: doing retry 5
send.c: socket has an error 111
send: error 'error opening connection to remote peer', object 'conn to srvr 127.0.0.1:31514'
send.c: after TryReestablish, sess state 9
send: after retry ret 10014
08:50:39 DUMMYSERVER: listening on port 31514
send: doing retry 6
send.c: relpTcpConnectTLSInit: lib: 0
send.c: frame to send: '1 open 92 relp_version=0
relp_software=librelp,1.4.0.master,http://librelp.adiscon.com
commands=syslog
'
08:50:40 DUMMYSERVER: got connection request
send.c: relpTcpSend: send data: 1 open 92 relp_version=0
relp_software=librelp,1.4.0.master,http://librelp.adiscon.com
commands=syslog

08:50:40 DUMMYSERVER: done receive
send.c: relpTcpSend: sock 3, lenbuf 103, send returned 103 [errno 115]
08:50:40 DUMMYSERVER: shutting down

remote peer aborts connection

send.c: Warning: exceeding window size, max 129, curr 128
send.c: ADD sess 0x1a91490 unacked 129, sessState 5
send.c: sendbuf added to unacked list
send.c: relpTcpRcv: read 0 bytes from sock 3
send.c: relpTcpRcv return. relptcp [0x1a96df0], iRet 0, lenRcvd 0, pLenBuf 0
send: error 'server closed relp session, session broken', object 'conn to srvr 127.0.0.1:31514'

librelp detects problem....

send.c: server closed relp session 0x1a91490, session broken
send.c: relpSessWaitState returns 0
send.c: relp session 0x1a91490 reestablished, now resending 129 unacked frames

...but does not reflect the state correctly in upper layers. Now the session is stalled (maybe only for some time [timeout], but it's not right in the first place).

send.c: resending frame '19880 syslog 8 00019879
'
send.c: relpTcpSend: send data: 2 syslog 8 00019879

send.c: relpTcpSend: sock 3, lenbuf 20, send returned 20 [errno 115]
send.c: sendbuf NOT added to unacked list
send.c: resending frame '19881 syslog 8 00019880
'
08:50:42 DUMMYSERVER: listening on port 31514
Starting Receiver...
got receive pid 4712
program started up, pidfile receive.pid contains 4712
Receiver running
receiver 4712 stopped
We have 19879 lines in librelp.out.log