broken pipe issue
Opened this issue · 6 comments
Hi
I have setup with 1 yanus proxy and 2 janus servers behind it. It works good for all demos and test calls with 5-10 participants.
Now I'm testing it, trying to get stable 3 rooms with 20 members in each - but this doesn't work for now.
I see such logs in janys-proxy log file:
BrokenPipeError: [Errno 32] Broken pipe
2022-04-21 12:47:50,795 [1690018] [januscloud.proxy.core.frontend_session] [ERROR] - Asynchronous event ({'janus': 'event', 'session_id': 7324896184483668, 'plugindata': {'plugin': 'janus.plugin.videoroom
', 'data': {'videoroom': 'event', 'room': 1743988738, 'leaving': 8968357616118914}}, 'sender': 5720134667431144}) transport failed on session (id:7324896184483668)
Traceback (most recent call last):
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/frontend_session.py", line 36, in notify_event
self.ts.send_message(event)
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/transport/ws.py", line 132, in send_message
self.send(self._msg_encoder.encode(message), binary=False)
File "/opt/janus-cloud/lib/python3.8/site-packages/ws4py/websocket.py", line 303, in send
self._write(m)
File "/opt/janus-cloud/lib/python3.8/site-packages/ws4py/websocket.py", line 285, in _write
self.sock.sendall(b)
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/transport/ws.py", line 67, in sendall
_sendall(data)
File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socket3.py", line 534, in sendall
return _socketcommon._sendall(self, data_memory, flags)
File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socketcommon.py", line 392, in _sendall
timeleft = __send_chunk(socket, chunk, flags, timeleft, end)
File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socketcommon.py", line 321, in __send_chunk
data_sent += socket.send(chunk, flags)
File "/opt/janus-cloud/lib/python3.8/site-packages/gevent/_socket3.py", line 515, in send
return self._sock.send(data, flags)
BrokenPipeError: [Errno 32] Broken pipe
Any idea why this happen and how to fix?
from your log, the reason is the websocket pipe between janus proxy and your client is broken which results into exception when we send mesage to it. janus-proxy has catch this exception and ignore it.
For some high work load case, it's possible that the low level socket may be already broken but the high level business is not yet notified about that and still send some data to this socket. So, you may get the above log.
I would lower the log level about that in the recent version
I think this log is not the real reason make you failed, please check the cpu workload of janus-proxy
@jamken Yes, I was observing cpu load as well, but didn't saw significant load.
Also, about errors, I had such one too:
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection wi
th ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection wi
th ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection wi
th ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection with ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection with ('127.0.0.1', 46960)
2022-04-20 18:10:28,440 [3858813] [januscloud.transport.ws] [ERROR] - Failed to handle received msg on websocket server connection with ('127.0.0.1', 46960): Already closed: websocket server connection with ('127.0.0.1', 46960)
2022-04-20 18:10:28,441 [3858813] [januscloud.proxy.core.frontend_session] [DEBUG] - an asynchronous event messge ({'janus': 'hangup', 'session_id': 7522993418610696, 'reason': 'DTLS timeout', 'sender': 661122662160242}) is sent back asynchronous for session "7522993418610696"
2022-04-20 18:10:28,442 [3858813] [januscloud.proxy.core.frontend_session] [DEBUG] - an asynchronous event messge ({'janus': 'hangup', 'session_id': 1882511552519992, 'reason': 'DTLS timeout', 'sender': 8338611274264920}) is sent back asynchronous for session "1882511552519992"
2022-04-20 18:10:28,443 [3858813] [januscloud.proxy.core.frontend_session] [DEBUG] - an asynchronous event messge ({'janus': 'hangup', 'session_id': 1256170478876418, 'reason': 'DTLS timeout', 'sender': 4842149081266732}) is sent back asynchronous for session "1256170478876418"
2022-04-20 18:10:28,444 [3858813] [januscloud.proxy.core.request] [DEBUG] - Request ({'janus': 'trickle', 'candidate': {'candidate': 'candidate:3534132064 1 udp 2113937151 16bcd6df-5dac-4eea-89e5-17af6f3032d8.local 54000 typ host generation 0 ufrag dHsY network-cost 999', 'sdpMid': '0', 'sdpMLineIndex': 0}, 'transaction': 'c8XRo1zYK557', 'session_id': 1200244565700485, 'handle_id': 6523215966388871}) is incoming to handle
2022-04-20 18:10:28,445 [3858813] [januscloud.proxy.core.frontend_session] [ERROR] - Couldn't find any session 1200244565700485
2022-04-20 18:10:28,445 [3858813] [januscloud.proxy.core.request] [WARNING] - Request ({'janus': 'trickle', 'candidate': {'candidate': 'candidate:3534132064 1 udp 2113937151 16bcd6df-5dac-4eea-89e5-17af6f3032d8.local 54000 typ host generation 0 ufrag dHsY network-cost 999', 'sdpMid': '0', 'sdpMLineIndex': 0}, 'transaction': 'c8XRo1zYK557', 'session_id': 1200244565700485, 'handle_id': 6523215966388871}) processing failed
Traceback (most recent call last):
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 295, in incoming_request
response = handler(request)
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 267, in _handle_trickle
handle = self._get_plugin_handle(request)
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 127, in _get_plugin_handle
session = self._get_session(request)
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/request.py", line 120, in _get_session
session = self._frontend_session_mgr.find_session(request.session_id)
File "/opt/janus-cloud/lib/python3.8/site-packages/januscloud/proxy/core/frontend_session.py", line 124, in find_session
raise JanusCloudError('No such session {}'.format(session_id), JANUS_ERROR_SESSION_NOT_FOUND)
This happens even with 2 rooms with 20 person in each - at some point log fills with Already closed: websocket server connection with ('127.0.0.1', 46960)
and then participants in room start to disconnect. And then I assume happens next one, JANUS_ERROR_SESSION_NOT_FOUND
, when they try to log back to the room but can't find it for some reason.
I think this problem may be in your front http(websocket) proxy . the ws connections from the http proxy (like 127.0.0.1:46960) to the janus-proxy(maybe 127.0.0.1:8288) broken for some reason. Please try to bypass the http proxy and test again
maybe the file fd number limitation? please check ulimite -a also