Unity-Technologies/com.unity.webrtc

[BUG]: Erratic ICE with multiple parallel PeerConnections

dfischer23 opened this issue · 1 comments

Package version

3.0.0-pre.7

Environment

* OS: MacOSX Sonoma 14.2.1
* Unity version: 2022.3.13.f1

Steps To Reproduce

  1. clone this Reproduction Project
  2. modify the number of parallel DataChannelTest to 10
  3. start project in Editor or build app.

Current Behavior

ICE setup won't finish and/or Editor or built App crashes with ~10 or more parallel PeerConnection setups

Expected Behavior

Every parallel DataChannelSample should be able to set up a connection just like in the example or when there is only one. Adding more instances should not affect operation.

Anything else?

In real life, i have a project that successfully sets up multiple peer-to-peer Audio connections using the "perfect negotiation" pattern and a proprietary signalling server. It becomes very unreliable as i increase the number of connections.

In this Reproduction Project, I use the simple DataChannel example but set it up a number of times in parallel. ICE/Session setup starts to fail as I increase the number of parallel setups- 1 to 3 mostly work, above it starts failing with seemingly random failure modes.

The number of parallel setups can be adjusted on the "DataChannelTest" object.

  • even with two parallel setups I start seeing:
    (peer_connection.cc:2705): 0 is not ready to use the remote candidate because the local or remote description is not set.
    which maybe points to an issue with the operations chain model. ICE setup still succeeds (likely because only a few candidates are affected)
  • with more setups (5-10), i see warnings and errors like (physical_socket_server.cc:852): Assuming benign blocking error : [0x00000039] Socket is not connected and (sdp_offer_answer.cc:2638): AddIceCandidate: ICE candidates can't be added without any remote session description., but connections mostly still complete.
  • with 10 setups, most of the time the Editor (or built app) crashes; often, the last message i see is (physical_socket_server.cc:1452): select : [0x00000016] Invalid argument

To check if the 'operations chain' is the culprit, i modified the test to cache all IceCandidates until the local and remote descriptions are setup (in the buffer-candidates branch. While the "not ready to use the remote candidate.." and similar messages go away, the apps still crash with many connections.

Another small observation which might be relevant: as i increase the number of parallel setups, it seems the number of ICE host candidates in individual setups also increases. In one test run, i saw the first setup had 4 candidates (normal), the second 5, then 6 etc until the 10th had 14! They've referenced different TCP ports, so it's not a simple doubling..

memo: WRS-505