Audio playback delay increasing
Closed this issue · 9 comments
os: windows 10 (19045.4651)
rustup toolchain: stable-x86_64-pc-windows-msvc
This is a pseudo code:
let (stream, stream_handle) = OutputStream::try_default()?;
let sink = Sink::try_new(&stream_handle)?;
I created a Sink
. Every time I decode PCM
data from the decoder, I create an AudioChunk
. AudioChunk
just represents this segment (for example, 100 milliseconds) of data and submits it to the queue in Sink
. Please note that AudioChunk
has implemented Source
.
sink.append(AudioChunk {});
struct AudioChunk {
buffer: Vec<i16>,
index: usize,
channels: u16,
sample_rate: u32,
frames: usize,
}
impl Source for AudioChunk {
fn current_frame_len(&self) -> Option<usize> {
Some(self.frames)
}
fn channels(&self) -> u16 {
self.channels
}
fn sample_rate(&self) -> u32 {
self.sample_rate
}
fn total_duration(&self) -> Option<Duration> {
Some(Duration::from_millis(
(self.frames as f64 / (self.sample_rate as f64 / 1000.0)) as u64,
))
}
}
impl Iterator for AudioChunk {
type Item = i16;
fn next(&mut self) -> Option<Self::Item> {
self.index += 1;
self.buffer.get(self.index - 1).map(|it| *it)
}
}
The audio data in the decoder comes from a real-time audio stream transmitted over the network and is encoded using opus
. I submit the PCM
fragments directly to the Sink
queue just because it is obviously possible to do so according to the rodio
interface design.
But I've observed increasing latency in the audio (compared to the sender in the network), and this latency increases over time.
I suspected that it was caused by the playback delay of rodio
, so I changed the implementation and preliminarily verified the problem. I cancelled the call to append
for each PCM
segment, and just updated the buffer
in Source
every time there was new PCM
data. The delay was significantly reduced, and there was no problem of continuous increase in delay. Only the sound in the speaker would be "broken"
struct AudioChunk {
buffer: Arc<RwLock<Vec<i16>>>,
index: Arc<AtomicUsize>,
channels: u16,
sample_rate: u32,
}
impl Source for AudioChunk {
fn current_frame_len(&self) -> Option<usize> {
None
}
fn channels(&self) -> u16 {
self.channels
}
fn sample_rate(&self) -> u32 {
self.sample_rate
}
fn total_duration(&self) -> Option<Duration> {
None
}
}
impl Iterator for AudioChunk {
type Item = i16;
fn next(&mut self) -> Option<Self::Item> {
let index = self.index.load(Ordering::Relaxed);
let value = self.buffer.read().unwrap().get(index).map(|it| *it).unwrap_or(0);
self.index.fetch_add(1, Ordering::Relaxed);
Some(value)
}
}
That's about it. When PCM data arrives, the external buffer and index are updated, so that Iterator starts reading from the beginning.
I observed that in most cases, when the next PCM
data arrives, the index
in Source
has not reached the end.
Sorry, English is not my native language, if there is something wrong with my description, I am willing to provide more information
The audio data in the decoder comes from a real-time audio stream transmitted over the network and is encoded using opus I submit the PCM fragments directly to the Sink queue
It might be better to use the decoder and let it do the decoding for you. The only thing you then need to do is to wrap your source of encoded audio data so that it implements Read + Seek + Send + Sync. Then you can just pass that object to Decoder::new
.
Let me know if this is helpful or not.
The audio data in the decoder comes from a real-time audio stream transmitted over the network and is encoded using opus I submit the PCM fragments directly to the Sink queue
It might be better to use the decoder and let it do the decoding for you. The only thing you then need to do is to wrap your source of encoded audio data so that it implements Read + Seek + Send + Sync. Then you can just pass that object to
Decoder::new
.Let me know if this is helpful or not.
As far as I know, the decoder does not provide OPUS support. If it is a raw PCM data stream, wrapping a decoder does not seem to make much sense.
ahh you are right, I thought it did since I saw a opus folder as part of the symphonia project but that is a placeholder. That is too bad. You second approach is the right one.
I observed that in most cases, when the next PCM data arrives, the index in Source has not reached the end.
You will always need some buffer in between to account for network latency and switching out the data, try increasing that buffer side. That switching out needs to happen in less then the time of a single sample point. You could try measuring the time a call to next
needs. It needs to be quite a bit lower then 1s/sample rate.
Finally as a sanity check, decode a few seconds (20?) of audio at once and put it in the buffer. Is the sound broken then? If so something is wrong with the decoding if not the issue is probably the swapping out.
ahh you are right, I thought it did since I saw a opus folder as part of the symphonia project but that is a placeholder. That is too bad. You second approach is the right one.
I observed that in most cases, when the next PCM data arrives, the index in Source has not reached the end.
You will always need some buffer in between to account for network latency and switching out the data, try increasing that buffer side. That switching out needs to happen in less then the time of a single sample point. You could try measuring the time a call to
next
needs. It needs to be quite a bit lower then 1s/sample rate.Finally as a sanity check, decode a few seconds (20?) of audio at once and put it in the buffer. Is the sound broken then? If so something is wrong with the decoding if not the issue is probably the swapping out.
In fact, I began to realize that there are some performance issues in my implementation. For example, in the structure of buffer: Arc<RwLock<Vec<i16>>>
, each time data is read in next
, a read-write lock will be passed. If the entire buffer is read at once, there is no problem with using a read-write lock. If a read lock is acquired for each byte, the overhead is huge. I will try to replace RwLock
with ArcSwap
and try it.
I will report the results later to see if my problem has been solved.
will try to replace RwLock with ArcSwap and try it.
Take care not to swap data out from under the Source. You need to assure the source has played all samples before swapping. You are probably better off with some form of double buffering. Have one buffer to read from meanwhile swap out the other. Once the Source is done with the first it switches over. This way the source decides when to switch instead of having to arrange that from the outside.
But before going through all this trouble, try a simple std::mpsc::channel. It should be ample fast (when compiling in release mode). You can push in samples one by one from the outside and take them out from the source. This comment complains of a max throughput of 900k items/s. Given most audio is 44.1k samples/s the channel should have ample performance.
will try to replace RwLock with ArcSwap and try it.
Take care not to swap data out from under the Source. You need to assure the source has played all samples before swapping. You are probably better off with some form of double buffering. Have one buffer to read from meanwhile swap out the other. Once the Source is done with the first it switches over. This way the source decides when to switch instead of having to arrange that from the outside.
But before going through all this trouble, try a simple std::mpsc::channel. It should be ample fast (when compiling in release mode). You can push in samples one by one from the outside and take them out from the source. This comment complains of a max throughput of 900k items/s. Given most audio is 44.1k samples/s the channel should have ample performance.
Thanks, the double buffer queue solved the problem for me. I swapped the two buffers internally and updated the buffer externally.
I have a question about Cpal. Cpal is different from Rodio. I need to match the sampling rate of the device myself. For example, the device only supports 44.1k samples/s, but the PCM to be played is 48k. How to deal with this? If it is just the number of channels or sampling accuracy, I can handle it, but when the sampling rate is different, I can't handle it. I don't know much about the processing flow of digital signals.
Because of this problem, I used Rodio, because Rodio seems to have handled these problems. Can you help me point out which part of Rodio handles these conversions?
thanks! @dvdsk
most lives here: https://github.com/RustAudio/rodio/tree/master/src/conversions
Though I am planning to look at replacing that with https://github.com/HEnquist/rubato in the future. That might be an option for you.
On the other hand have you considerd adding opus as decoder to rodio? That might be less work then porting rodio futures to your codebase.
most lives here: https://github.com/RustAudio/rodio/tree/master/src/conversions
Though I am planning to look at replacing that with https://github.com/HEnquist/rubato in the future. That might be an option for you.
On the other hand have you considerd adding opus as decoder to rodio? That might be less work then porting rodio futures to your codebase.
Opus is widely used in the industry now, so it makes sense for Rodio to add Opus support. I will try to implement an Opus decoder for rodio, please let me know if there is anything I can do to help.
Of course, I will also share my own problems.
You can take a look at the files here: https://github.com/RustAudio/rodio/tree/master/src/decoder as an example. The code will probably have to go through a few reviews so it will take a long while. But after that it should also be bug free 🎉