polyphony-chat/chorus

Voice Channel support

Opened this issue · 13 comments

Add support for sending and receiving live streams of WebRTC Video and Audio.

At least this is well documented

Some relevant info:

  • Encryption is done using xsalsa (specifically xsalsa20_poly1305, xsalsa20_poly1305_suffix or xsalsa20_poly1305_lite), I've found the easiest way to implement this is probably the RustCrypto secretbox crate (which already combines salsa20 with poly1305)
    (Also, Discord says they themselves use libsodium and secretbox is part of NaCl-compat, which is a rust implementation of libsodium's API)

  • Audio is encoded with the Opus open audio codec (stereo, @ 48kHz). For opus in rust, we can use opus-rs, which provides high level bindings for libopus, so we can just consult the official opus documentation

(for encryption, see this part of discord's docs)

The encryption and encoding seems to be the hardest part of implementing this, the actual communication is very similar to the gateway

Oh and we also have no info about video, so we should probably implement voice first

Doing some research, video uses VP8 and VP9, maybe env-libvpx-sys in rust?

{
	"codecs": [
		{
			"name": "opus",
			"type": "audio",
			"priority": 1000,
			"payload_type": 109,
			"rtx_payload_type": null
		},
		{
			"name": "VP8",
			"type": "video",
			"priority": 2000,
			"payload_type": 120,
			"rtx_payload_type": 124
		},
		{
			"name": "VP9",
			"type": "video",
			"priority": 3000,
			"payload_type": 121,
			"rtx_payload_type": 125
		}
	]
}

Webrtc-rs may also be worth looking into, since it seems to support all the needed codecs

Potentially maybe perhaps looking into this on feature/webrtc
I probably won't manage to implement it though lol

Good luck :)

Also looking at how Serenity implemented this, they seem to have developed their own rtp parser and even their own opus bindings

I wonder if we could... yoink this :D

yup :D

Slight change of plans, raw UDP first then webrtc
because webrtc is really complex and raw udp seems to be way easier to implement for now

wasm cannot support udp.

A few notes:

  • First, it seems that UDP is not strictly the older and Webrtc the newer version; Discord seems to use UDP in their native clients and Webrtc on web - we could very likely do the same

  • Second, the official udp voice docs have been expanded a bit, notably in the encryption modes. Once I pick this back up again we should take a closer look at that

  • Third, I feel that #457 was about 90% of the way there. I just need to find a code snippet for how to decode opus from rtp packets in rust (or generally, how to use any of the opus bindings / libraries to decode opus packets)