Welcome to the GodotEngine GDExtension twovoip that applies the xiph/opus compression library (and optionally the xiph/rnnoise de-noiser and OVRLipSync viseme detector) to an audio stream of speech from the microphone. The starting point for this project was one-voip-godot-4.
Thanks to @ajlennon and @DmitriySalnikov for indefatiguable work on the github actions that are successfully building this plugin across all six GodotEngine supported platforms.
An HTML5 demo is hosted at https://goatchurch.itch.io/twovoip-mqtt
The purpose of this demo is to test all the features
so you can hear what the opus compression and noise cancelling settings do to the
sound of a voice.
- Clone/Download this repository and open the project in the
example/
directory in Godot 4.3. - Go to assetlib, search for twovoip, and install it. (Now on version 3.1)
- Run the app.
- If the microphone is working, then you should see a waveform in the app like this:
If there is no response on MacOS, it could be this issue
Go to Project Settings (with Advanced Settings selected) -> Audio -> Driver -> Mix rate and set to 48000
The top section of the user interface has the PTT (Press to Talk) button and Vox button (Voice Activity Detection) where the activation threshold is given in the slider below it (on top of the waveform). Click on [De-noise] to hear how recordings sound with and without this feature.
Use the [Play] button in the Recording Playback section to hear up to 10 seconds of the last recording you made by holding down the PTT (either manually or automatically by the Vox). The Bytes per second for the audio compression is shown here.
The LibOpus compression and resampling section has options for altering the frame size and compression rate on the recorded sample before you play it back so you can experiment with how it sounds with different settings. Since the internal Godot audio sample rate of 44.1kHz doesn't match the 48kHz that the Opus and RNNoise libraries are designed for, a function copied from the Speex codebase resamples it to the higher sample rate. If you are curious about what how much difference this 8% change in the sample rate makes, you can experiment with the "44.1kHz as 48kH" option.
Finally, there is an MQTT transmission section for you to publish audio packets over the network via a a broker on a topic. Click the [Connect] button to go online while a friend does the same on another computer and you should be able to talk to one another over the internet (don't forget to use the PTT button). Several presets are given for convenience, and it will automatically use websockets if you are operating from HTML5.
MQTT is a lightweight protocol
implemented in another GodotEngine GDExtension https://godotengine.org/asset-library/asset/1993
and described here.
Its publish and subscribe, and retained and last will messages system provides a simple basis for
each player track who is joining or leaving the network. There is a line of text beginning with mosquitto_sub
command that you can copy into your .
If you are familiar with the Godot Audio system, the following minimal use case of this plugin should make sense:
As outlined in the docs,
create an AudioStreamPlayer
with stream=AudioStreamMicrophone
, set it to Autoplay, and
ensure your ProjectSettings have audio/driver/enable_input
set to true.
Set its bus to a new bus called "MicrophoneBus" which should be Muted to
stop it creating a feedback loop to the output. Add an effect
OpusChunked
to the MicrophoneBus. This will only be an option if the twovoip
addon is installed.
Assuming that AudioEffectOpusChunked
is the first one on the bus, you can get a reference to it with
var microphoneidx = AudioServer.get_bus_index("MicrophoneBus")
var opuschunked : AudioEffectOpusChunked = AudioServer.get_bus_effect(microphoneidx, 0)
Now you can consume and transmit the byte chunks with the following code:
func _process(delta):
var prepend = PackedByteArray()
while opuschunked.chunk_available():
var opusdata : PackedByteArray = opuschunked.opus_chunk(prepend)
opuschunked.drop_chunk()
transmit(opusdata)
At the other end you can decode the opus packets into an AudioStreamPlayer
whose
stream is set to an AudioStreamOpusChunked
.
var audiostreamopuschunked : AudioStreamOpusChunked = $AudioStreamPlayer.stream
var opuspacketsbuffer = [ ] # append incoming packets to this list
var prefixbyteslength = 0
func _process(delta):
while audiostreamopuschunked.chunk_space_available():
var fec = 0
audiostreamopuschunked.push_opus_packet(opuspacketsbuffer.pop_front(), prefixbyteslength, fec)
Opus packets don't have any context, so if you want to number them so they can be shuffled
if they get out of order in the particular network data channel you are using, you can use the prepend
array to splice an index value into a header.
Then prefixbyteslength
needs to be the same length as this header so it can be split off
on its way to the decoder.
The forward error correction flag, fec
, can be set to 1 if the previous packet is missing.
If you want to attach only native Godot classes to the audio busses and audio streams
you can do the same thing as above
using the corresponding AudioEffectCapture
and AudioStreamGeneratorPlayback
object to
handle the audio chunks in the form of
PackerVector2Array
s while running these two external classes in isolation, like
audioopuschunkedeffect.chunk_to_opus_packet(prefixbytes, audiosamples, denoise)
and:
var audiostreamgeneratorplayback = $AudioStreamPlayer.get_stream_playback()
while audiostreamgeneratorplayback.get_frames_available() > audiostreamopuschunked.audiosamplesize:
var audiochunk = audiostreamopuschunked.opus_packet_to_chunk(opuspacketsbuffer.pop_front(), prefixbyteslength, fec)
audiostreamgeneratorplayback.push_buffer(audiochunk)
audiostreamgeneratorplayback.push_buffer(audiopacketsbuffer.pop_front())
The chunk_max()
function is for implementing a Vox (Voice Activity Detection) feature
so that you can save processor cycles by dropping chunks before you opus encoding them.
Or you can use denoise_resampled_chunk()
(which requires resampling to 48kHz) to read a
speech probability, or optionally measure chunk_max()
post de-noising.
The opus compression and denoiser features need the chunks to be sent to them in order
because they use the state recorded from earlier audio samples to provide context and improve the performance
of the current chunk. Use flush_opus_encoder()
if you anticipate a gap from the previous chunk
(eg the PTT was off for a period and there was no processing).
The undrop_chunk()
function can roll back the chunk buffer and by some milliseconds
so you can avoid clipping at the start of a speech sequence.
There are three submodules in this repository.
godot-cpp is contains the header files and class definitions required to build a compiled GDExtension object that can dynamically link to the GodotEngine at runtime.
opus is the opus voice compression and decompression library from xiph.org that generally takes an array of 960 pairs of floats representing 20ms of stereo audio samples at 48kHz and returns 20 to 30 bytes of compressed data for that chunk.
noise-suppression-for-voice contains a copy of the xiph/rnnoise
code in its external/rnnoise directory with the all important CMakeLists.txt
file that makes it possible
to compile it on all the diffeerent platforms
The sequence of commands to build the system locally
nix-shell -p scons cmake ninja autoreconfHook # if you are on nix
scons apply_patches # optional
scons build_opus # build opus using cmake
scons build_rnnoise # build opus using cmake
scons # build this library
To compile for another platform like web, the commands are
scons apply_patches
scons platform=web target=template_release build_opus
scons platform=web target=template_release build_rnnoise
scons platform=web target=template_release
This is a highly speculative component that takes advantage of the chunking feature in the OpusChunked effect,
but which is currently closed source and distributed as a library only for Windows, Android and Mac.
There is no Linux version.
The github actions compiles a version for the available platforms
with scons lipsync=yes
and creates an addons/twovoip_lipsync
that can be copied into a project
Download the OVRLipSync libraries from https://developer.oculus.com/documentation/native/audio-ovrlipsync-native/ and unzip into top level as OVRLipSyncNative directory in this project. There is a stub include file for Linux that allows this GDExtension to compile without this library.
On Windows you may need to copy the OVRLipSyncNative/Lib/Win64/OVRLipSync.dll
file to the same directory
as your GodotEngine.exe
so that it finds and links it.
For the addon to work correctly, twovoip_lipsync
and twovoip
cannot be used in the same project.
The build system is defined by the flake.nix file
- makes a result directory that needs to be copied into addons
nix build
cp result/addons/twovoip/*so addons/twovoip
- android version:
nix build .#android
cp result/addons/twovoip/*so addons/twovoip
On Windows:
Use Visual Studio 2022 Community Edition with CMake option to open opus directory and convert cmake script to sln and then compile.
cd ../..
python -m SCons