terminal-discord/discord-voice-cli

can't compile

frytkisasmaczne opened this issue · 6 comments

I see that it's not maintained anymore, but i need to do the exact thing this program does, can you help me fix it up just this little bit so it can compile?
I'm still learning rust, so I don't get most concepts and I couldn't manage to patch it up.
Even if I need some old version of cargo that would be fine for me, I just need to modify it a bit, compile and put on a raspberry pi.

Ok, so, there are a number of issues with using this (but it should be possible!).

First, serenity. serenity is the library used to connect to Discord. This crate will require a fork of that lib that supports user tokens (unless you are ok with voice through a bot, in which case there are far better options than using this code). I have a (older) fork that supports user tokens here. You need to update serenity and update this project to the new api (see the end of this comment).

With that, the crate should compile. However, I have not been keeping up with the state of rust audio libraries (at the time I wrote this code, it wasn't fantastic). It would probably be beneficial (but not necessarily required) to update cpal or migrate to whatever is best now.

Now for the deficiencies in this project. There is no Voice Activity Detection (I could never find that in the opus api..), but it sounds like you don't need that.
The much larger issue is that audio is not resampled. Meaning the mic input might be 48kHz but discord usually isn't. This means the audio will be very distorted. A quick search shows there are a few rust crates to help with resampling.

As for the actual tweaks to get get the code "working". All I needed was to run cargo update to get the latest version of my serenity fork, and to make these changes. If you have any other questions, feel free to ask.

diff --git a/src/receiver.rs b/src/receiver.rs
index e68fe29..952d764 100644
--- a/src/receiver.rs
+++ b/src/receiver.rs
@@ -74,6 +74,7 @@ impl AudioReceiver for Receiver {
         _timestamp: u32,
         _stereo: bool,
         data: &[i16],
+        _: usize,
     ) {
         self.0.lock().extend(data);
     }
diff --git a/src/sender.rs b/src/sender.rs
index 388173b..2f084d8 100644
--- a/src/sender.rs
+++ b/src/sender.rs
@@ -87,4 +87,12 @@ impl AudioSource for Sender {
     fn read_opus_frame(&mut self) -> Option<Vec<u8>> {
         unimplemented!("Opus is not configured");
     }
+
+    fn decode_and_add_opus_frame(
+        &mut self,
+        float_buffer: &mut [f32; 1920],
+        volume: f32
+    ) -> Option<usize> {
+        unimplemented!("Opus is not configured");
+    }
 }

So far so good, it joins the channel, but doesn't transmit sound (no green circle). Console output is <user> is connected, and then format struct: Format { channels: 2, sample_rate: SampleRate( 48000, ), data_type: F32, }
the same way on mac and pc.
I'm wondering if it's about the sample rate or something else
cpal works on both devices, tested with some examples

I could use a bot token if it allows me to use some different, easier framework.
I bodged something together in node.js, but it had solid 3 seconds of latency from saying to hearing, so I guess node won't work

That is interesting, I'm not sure what could be causing that.

And yes, using a bot token would let you use any unmodified discord library, which could make things much easier.

It occurs to me that serenity has an api to stream ffmpeg to discord. It might be possible to use ffmpeg to do the hard work (ffmpeg docs). (Nevermind, I didn't read, this is windows only, although there seems to be alsa support)

I feel as though there should be a much simpler way to do what you want...

it's possible to collect microphone input with ffmpeg, I already tried replacing ffmpeg arguments in fn _ffmpeg_optioned in streamer.rs of serenity with tested macos arguments for microphone input, but I think the result was about the same as rn with your fixes.

That is very interesting. I just tested and I cannot get ffmpeg to stream mic to discord (on macos) nor can I even stream a file. Not really sure what the cause could be. It might be caused by my outdated serenity fork.

So I figured it out (mostly).

First off, serenity needs to be tweaked so that you can provide the -f argument before the -i one to enable desktop capturing devices.
Then, it is important to add the correct arguments so that ffmpeg pipes it's output: -f s16le -.
It is also important to get the stereo configuration correct, or the audio will be distorted (serenity tries to detect this itself, but it may not be correct with custom ffmpeg arguments).
Also, the other serenity ffmpeg commands use these arguments -re -ac 2 -ar 48000 -acodec pcm_s16le I don't know for sure how important they are.

Hope that helps.