livekit/client-sdk-js

Failed to execute 'setRemoteDescription' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.

HermanBilous opened this issue · 18 comments

Describe the bug

I have this issue that I cannot reproduce at all, but get a lot of errors in sentry for. Could you please take a look and point in a direction to how to fix it. Here's the log I can extract from sentry.

Reproduction

N/A

Logs

[
  {
    "level": "warning",
    "data": {
      "arguments": [
        "websocket closed",
        {
          "code": 1006,
          "identity": "<user_id>",
          "reason": "",
          "room": "room_b",
          "roomSid": "room_sid_b",
          "state": 1,
          "wasClean": false
        }
      ]
    }
  },
  {
    "level": "warning",
    "data": {
      "arguments": [
        "signal disconnected",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "info",
    "data": {
      "arguments": [
        "resuming signal connection, attempt 0",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "info",
    "data": {
      "arguments": [
        "Resuming signal connection",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "error",
    "data": {
      "arguments": [
        "Failed to execute 'setConfiguration' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.",
        {
          "error": "[DOMException]",
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "warning",
    "data": {
      "arguments": [
        "reconnect disconnected",
        {
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  },
  {
    "level": "error",
    "data": {
      "arguments": [
        "unable to set offer",
        {
          "fields": "[Object]",
          "identity": "<user_id>",
          "room": "room_b",
          "roomSid": "room_sid_b"
        }
      ]
    }
  }
]


### System Info

```shell
Browser: any, but mostly Chrome
OS: Windows / MacOS

Severity

annoyance

Additional Information

No response

thanks for the report!
which SDK version is this? Did it start popping up after a specific upgrade?

It looks like the PeerConnection went into a closed state without that triggering a reconnect (which it should).
On a signal resume the SDK then tries to update the PeerConnection's config, but that fails as it is already closed.
Any logs that happen before the one's you shared might be helpful here as well

I'm using 2.0.3 on dev and 1.15.10 on prod. I can see this issue reported on both environments. We had our sentry disabled for some time, but I remember having this issue on older versions of sdk as well.

We also get the same amount of errors like: InvalidStateError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'. reported as well.

I found this code:

      if (e instanceof UnexpectedConnectionState) {
        this.log.debug('received unrecoverable error', { ...this.logContext, error: e });
        // unrecoverable
        recoverable = false;
      } else if (!(e instanceof SignalReconnectError)) {
        // cannot resume
        this.fullReconnectOnNext = true;
      }

It is executed when the Failed to execute 'setRemoteDescription'... error occurs. Judging by the reported error, the code above receives e of type SignalReconnectError, so the error is considered recoverable.

I'll dig for more logs.

Thanks for the insight, that means that the clients to perform a reconnect then after hitting that error?

If it's not full reconnect, it will only do a signal reconnect. Maybe a pc reconnect at this point is better?

Nevertheless, I could not find code that will do anything when onSignalingStatechange is triggered. I mean, it calls PCTransportManager.updateState, but then in mentioned function nothing gets done about changes in pc.signalingState. Maybe that's why it never triggers a reconnect?

I looked into logs from other events, but they all have the same error pattern from original post in common.

Yeah, you're totally right. This code path failing should not be treated as a SignalReconnectError and instead force a full reconnect. Thanks for digging into this!

Thanks for the PR. I will update my project once it's released, and will report back if it fixed my problem.

So, the changes did not help, I think they made the issue even worse, since now I get 3 unhandled exceptions instead of 1. I created #1063 to at least add a catch clause on those errors. This should at least make the errors handled and not be reported by sentry, which helps my case. This won't help with the issue itself, however.

Hey, I think we should close this one for now.. I thought that the fix went into the latest release somehow, I even read the release notes, and I swear I saw that PR in there.

were you able to test the changes in the latest release?

yes, the error is still present. From error trace I can see that it's coming from callbacks added in setupSignalClientCallbacks()

could you provide the exact error and stack trace of an error that was raised in the version released yesterday (v2.0.5) ?

The exact stack trace looks like this:

Error: Failed to execute 'setRemoteDescription' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.
  at ss.<anonymous>(/assets/lk-Bsa7hxT1.js:1:214013)
  at Generator.throw(<anonymous>)
  at c(/assets/lk-Bsa7hxT1.js:1:61326)

As you can see, not much of a stacktrace on this error, because ts wrapped it in a generator. If I look into code, I see this at where the error was thrown:

ject.assign(Object.assign({},this.logContext),{fields:o})),new Hn(s)}})}}function os(i,e,t){let n=0;i.rtp.some(r=>r.codec==="opus"?(n=r.payl

Which leads me to:
PCTransport.setMungedSDP throw new NegotiationError(msg);.

Looking into livekit's console logs:

websocket closed {
code: 1006,
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
reason: ,
room: rid,
roomID: RM_3jcnGDm7UKM7,
state: 1,
wasClean: false
}
signal disconnected {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
resuming signal connection, attempt 0 {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
Resuming signal connection {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
reconnect disconnected {
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}
unable to set offer {
fields: [Object],
participant: pid_1,
pID: PA_QgjpAGj2QoVn,
room: rid,
roomID: RM_3jcnGDm7UKM7
}

Then

Error: Failed to execute 'setRemoteDescription' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.

Right after error on setRemoteDescription I get

InvalidStateError: Failed to execute 'addIceCandidate' on 'RTCPeerConnection': The RTCPeerConnection's signalingState is 'closed'.

It originates in client.onTrickle

Thanks! How do the client logs continue after the setRemoteDescriptionError? Just trying to make sure that the reconnect logic behaves as expected now.

Hm, that's a good question. Unfortunately, we don't have tracking of what's happening after the error, I'm not even sure if this is possible with sentry. But, a good thing is that there are only 1 such error reported, not 3 of them consecutively, like I saw before.

It would be great if we were to replicate this issue in an isolated manner. Any hints on what kind of client (OS + browser combination) runs into this most frequently or what might be causing this reconnect pattern in the first place?

I think this is somehow connected to the server. E.g. room migrates, or lk server becomes unavailable for some reason and shutdowns abruptly. As for the browser, I see that it happens on chrome / edge, on windows, mac and even linux. The livekit-server we use is 1.5.0

@HermanBilous I'll close this issue for now. We haven't heard any similar reports. If you're still seeing this issue with the latest client+server versions feel free to let me know and I'll reopen it.