Passing in `Opus` encoded audio to `play_source` for passthrough to discord only works if the audio is under 1432 bytes.

Question

Passing in `Opus` encoded audio to `play_source` for passthrough to discord only works if the audio is under 1432 bytes.

Pranav-Badrinathan opened this issue a year ago · 5 comments

Pranav-Badrinathan commented a year ago

Songbird version: 0.3.2

Rust version (rustc -V): 1.70.0

Serenity/Twilight version: Serenity 0.11.5

Output of ffmpeg -version, yt-dlp --version (if relevant):
Not relavant

Description:

This could be a bug, or intended.
When passing in Opus frames with Container::Dca to create an Input (relevant discussion) and passing it over to play_source to be played, the playback thread always panics if the frame size is over 1432 bytes exactly. This seems to have something to do with VOICE_PACKET_MAX, which is used in Mixer::new to initialize an array of zeroes.

The confusing part is, when debugging, I saw that the code panicked here, where buffer was a slice with only zeroes and length 1432 (NOT the data used to initialize Reader), but frame.frame_len was the length of the frame sent into play_source. This possibly looks like the array initialized with VOICE_PACKET_MAX that I mentioned above, but with a length value equal to the data passed into play_source.

Backtrace, if it is helpful.

thread '<unnamed>' panicked at 'range end index 3269 out of range for slice of length 1432', C:\Users\pranav\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\input\mod.rs:320:34
stack backtrace:
   0: std::panicking::begin_panic_handler
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\std\src\panicking.rs:578
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\panicking.rs:67
   2: core::slice::index::slice_end_index_len_fail_rt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\slice\index.rs:78
   3: core::slice::index::slice_end_index_len_fail
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\slice\index.rs:70
   4: core::slice::index::impl$4::index_mut<u8>
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:422
   5: core::slice::index::impl$5::index_mut<u8>
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:463
   6: core::slice::index::impl$1::index_mut<u8,core::ops::range::RangeTo<usize> >
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:31
   7: songbird::input::Input::read_opus_frame
             at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\input\mod.rs:320
   8: songbird::driver::tasks::mixer::mix_tracks
             at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:573
   9: songbird::driver::tasks::mixer::Mixer::cycle
             at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:420
  10: songbird::driver::tasks::mixer::Mixer::run
             at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:132
  11: songbird::driver::tasks::mixer::runner
             at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:625
  12: songbird::driver::tasks::start_internals::closure$1
             at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mod.rs:58

Steps to reproduce:

Encode some audio in Opus.
Add Dca spec required frame lengths at the start of each frame. Make sure that each frame is bigger than 1432 bytes in size.
Create an Input using

Input::new(
	stereo:    true,
	reader:    Reader::from_memory(buf: Vec<u8>), // Where buf is greater than 1432 in size.
	codec:     Codec::Opus(OpusDecoderState::new()::unwrap()),
	container: Container::Dca { first_frame: 0 },
	metadata:  None
)

Pass this input into play_source.

Sorry for the trouble is this is not a bug, I just can't completely follow the route the code takes to end up this way.

Answer 1 · 2023-07-04T20:38:01.000Z

To my knowledge this is intended; maybe it shouldn't panic, but equally your audio frames need to be small enough to fit in a single UDP packet without fragmenting. The current limit takes some headroom under the typical MTU since there's variation among ISPs -- it could stand to go up a bit, possibly.

My back of the napkin math suggests your frames are encoded at... 572kbps assuming 20ms each? I don't think Discord are smart enough to stop you regardless, but that does seem a tad extreme.

Answer 2 · 2023-07-04T20:57:18.000Z

So in that case play_sound should be called per audio frame? In this case I was combining multiple frames and calling the function for each combined collection, hence the large size.

Answer 3 · 2023-07-05T18:56:30.000Z

Closing this then, as this is not a bug. Thanks for taking the time to reply, and sorry for the inconvenience!

Answer 4 · 2023-07-05T19:04:52.000Z

Not necessarily per audio frame; if you're stuffing ~5 frames into a packet, if you know their length then you can just write out the length of each individually. I.e. (header) | len(frame1) | frame1 | len(frame2) | frame2 | len(frame3) | .... If this needs to be longer-lived, you could presumably write a custom Read implementation which blocks on receiving frames, reads into an internal buffer, and then handles this transformation for each packet.

Answer 5 · 2023-07-05T19:46:08.000Z

I was doing the same without the header, and it was playing the first frame, skipping the rest. If it is required to qualify each packet as a file made up of multiple frames, I'll attempt it with the header.

If not, I can simply just play it frame by frame. Thanks!