Passing in `Opus` encoded audio to `play_source` for passthrough to discord only works if the audio is under 1432 bytes.
Pranav-Badrinathan opened this issue · 5 comments
Songbird version: 0.3.2
Rust version (rustc -V
): 1.70.0
Serenity/Twilight version: Serenity 0.11.5
Output of ffmpeg -version
, yt-dlp --version
(if relevant):
Not relavant
Description:
This could be a bug, or intended.
When passing in Opus
frames with Container::Dca
to create an Input
(relevant discussion) and passing it over to play_source
to be played, the playback thread always panics if the frame size is over 1432 bytes exactly. This seems to have something to do with VOICE_PACKET_MAX
, which is used in Mixer::new
to initialize an array of zeroes.
The confusing part is, when debugging, I saw that the code panicked here, where buffer
was a slice with only zeroes and length 1432 (NOT the data used to initialize Reader
), but frame.frame_len
was the length of the frame sent into play_source
. This possibly looks like the array initialized with VOICE_PACKET_MAX
that I mentioned above, but with a length value equal to the data passed into play_source
.
Backtrace, if it is helpful.
thread '<unnamed>' panicked at 'range end index 3269 out of range for slice of length 1432', C:\Users\pranav\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\input\mod.rs:320:34
stack backtrace:
0: std::panicking::begin_panic_handler
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\std\src\panicking.rs:578
1: core::panicking::panic_fmt
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\panicking.rs:67
2: core::slice::index::slice_end_index_len_fail_rt
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\slice\index.rs:78
3: core::slice::index::slice_end_index_len_fail
at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library\core\src\slice\index.rs:70
4: core::slice::index::impl$4::index_mut<u8>
at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:422
5: core::slice::index::impl$5::index_mut<u8>
at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:463
6: core::slice::index::impl$1::index_mut<u8,core::ops::range::RangeTo<usize> >
at /rustc/90c541806f23a127002de5b4038be731ba1458ca\library\core\src\slice\index.rs:31
7: songbird::input::Input::read_opus_frame
at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\input\mod.rs:320
8: songbird::driver::tasks::mixer::mix_tracks
at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:573
9: songbird::driver::tasks::mixer::Mixer::cycle
at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:420
10: songbird::driver::tasks::mixer::Mixer::run
at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:132
11: songbird::driver::tasks::mixer::runner
at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mixer.rs:625
12: songbird::driver::tasks::start_internals::closure$1
at C:\Users\prana.LAPTOP-MODAKBGP\.cargo\registry\src\index.crates.io-6f17d22bba15001f\songbird-0.3.2\src\driver\tasks\mod.rs:58
Steps to reproduce:
- Encode some audio in Opus.
- Add
Dca
spec required frame lengths at the start of each frame. Make sure that each frame is bigger than 1432 bytes in size. - Create an
Input
using
Input::new(
stereo: true,
reader: Reader::from_memory(buf: Vec<u8>), // Where buf is greater than 1432 in size.
codec: Codec::Opus(OpusDecoderState::new()::unwrap()),
container: Container::Dca { first_frame: 0 },
metadata: None
)
- Pass this input into
play_source
.
Sorry for the trouble is this is not a bug, I just can't completely follow the route the code takes to end up this way.
To my knowledge this is intended; maybe it shouldn't panic, but equally your audio frames need to be small enough to fit in a single UDP packet without fragmenting. The current limit takes some headroom under the typical MTU since there's variation among ISPs -- it could stand to go up a bit, possibly.
My back of the napkin math suggests your frames are encoded at... 572kbps assuming 20ms each? I don't think Discord are smart enough to stop you regardless, but that does seem a tad extreme.
So in that case play_sound
should be called per audio frame? In this case I was combining multiple frames and calling the function for each combined collection, hence the large size.
Closing this then, as this is not a bug. Thanks for taking the time to reply, and sorry for the inconvenience!
Not necessarily per audio frame; if you're stuffing ~5 frames into a packet, if you know their length then you can just write out the length of each individually. I.e. (header) | len(frame1) | frame1 | len(frame2) | frame2 | len(frame3) | ...
. If this needs to be longer-lived, you could presumably write a custom Read
implementation which blocks on receiving frames, reads into an internal buffer, and then handles this transformation for each packet.
I was doing the same without the header
, and it was playing the first frame, skipping the rest. If it is required to qualify each packet as a file made up of multiple frames, I'll attempt it with the header
.
If not, I can simply just play it frame by frame. Thanks!