Last ~0.5s always truncated when resampling audio
xd009642 opened this issue · 11 comments
So this may well be a user error so I've extracted out the relevant code to create a minimal example and pushed it to here: https://github.com/xd009642/resampling_example the behaviour is very consistent though and seems to be about the same amount of data missing from the end regardless of file length (I've tried this with 3 wave files going from 44100Hz->8000Hz some stereo some mono).
This example will load the audio file at 44100Hz and resample it to 8000Hz. Whenever I do this I find that the end of the audio file gets chopped off. Below is a screenshot of audacity showing the source file as the first track and the output as the second track.
I wonder if this could be related to how I'm flushing the resampler at the end?
audio_decoder.flush();
while let Ok(Some(_)) = resampler.flush(&mut resampled_audio) {
data.append(&mut get_samples(&resampled_audio));
}
EDIT ffmpeg version info, but I've also observed this behaviour on ffmpeg 4.3.2 as well:
ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
libavutil 55. 78.100 / 55. 78.100
libavcodec 57.107.100 / 57.107.100
libavformat 57. 83.100 / 57. 83.100
libavdevice 57. 10.100 / 57. 10.100
libavfilter 6.107.100 / 6.107.100
libavresample 3. 7. 0 / 3. 7. 0
libswscale 4. 8.100 / 4. 8.100
libswresample 2. 9.100 / 2. 9.100
libpostproc 54. 7.100 / 54. 7.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
Use -h to get full help or, even better, run 'man ffmpeg'
Could you check if any of the flush calls are returning errors?
Just changed it to unwrap the one unchecked flush call as so:
audio_decoder.flush();
while let Some(_) = resampler.flush(&mut resampled_audio).unwrap() {
println!("Getting some bytes");
data.append(&mut get_samples(&resampled_audio));
}
No error, also I put in a print in that loop and it returns None
so it must think that there's no samples remaining in the resampling context.
Oh if the resampler returns the last of the audio in that flush call the first time would it then return None
and I wouldn't add the audio to the output buffer... Let me try that.
EDIT: Nope didn't change anything, so now the changed code for the resampler is:
println!("Flush decoder and read last bits");
audio_decoder.flush();
while resampler.delay().is_some() {
println!("Flushing");
resampler.flush(&mut resampled_audio).unwrap();
data.append(&mut get_samples(&resampled_audio));
}
And flushing is never printed.
Another observation, this time I went for an 8KHz wav and 8KHz mp3 as the source audios so the resampler shouldn't change the audio in the slightest.
This is the input wav file and output wav file. They match exactly
This is the input mp3 file and output wav file, we can see again ~0.5s is removed from the end (11.23s vs 10.52s).
Not sure if this suggests misuse of the audio decoder as mp3 decoding is more involved than wav... I believe mp3 is planar data not packed so the resampler may be doing the work to convert between the two layouts but nothing else?
Yeah I feel like something is not being fully flushed somewhere.
I will try to dig in as soon as I have time, but do keep digging, it might be inside the library itself, too.
Yeah I've started going over the library code this afternoon and ffmpeg C examples to see if I can find anything that stands out to me. I'll also enable the trace logging but I tried that before I had the minimal example and nothing stood out
Output for the 44100Hz wav with the ffmpeg trace logging enabled, most of it's the probe stuff no real logging from resampler
Probing wav score:99 size:2048
[wav @ 0x55693aeb0c60] Format wav probed with size=2048 and score=99
[wav @ 0x55693aeb0c60] Before avformat_find_stream_info() pos: 44 bytes read:67584 seeks:3 nb_streams:1
[wav @ 0x55693aeb0c60] probing stream 0 pp:32
Probing mp3 score:1 size:4096
[wav @ 0x55693aeb0c60] Probe with size=4096, packets=2469 detected mp3 with score=1
[wav @ 0x55693aeb0c60] probing stream 0 pp:31
Probing mp3 score:1 size:8192
[wav @ 0x55693aeb0c60] Probe with size=8192, packets=2470 detected mp3 with score=1
[wav @ 0x55693aeb0c60] probing stream 0 pp:30
[wav @ 0x55693aeb0c60] probing stream 0 pp:29
Probing mp3 score:1 size:16384
[wav @ 0x55693aeb0c60] Probe with size=16384, packets=2472 detected mp3 with score=1
[wav @ 0x55693aeb0c60] probing stream 0 pp:28
[wav @ 0x55693aeb0c60] probing stream 0 pp:27
[wav @ 0x55693aeb0c60] probing stream 0 pp:26
[wav @ 0x55693aeb0c60] probing stream 0 pp:25
[wav @ 0x55693aeb0c60] probing stream 0 pp:24
[wav @ 0x55693aeb0c60] probing stream 0 pp:23
[wav @ 0x55693aeb0c60] probing stream 0 pp:22
[wav @ 0x55693aeb0c60] probing stream 0 pp:21
[wav @ 0x55693aeb0c60] probing stream 0 pp:20
[wav @ 0x55693aeb0c60] probing stream 0 pp:19
[wav @ 0x55693aeb0c60] probing stream 0 pp:18
[wav @ 0x55693aeb0c60] probing stream 0 pp:17
[wav @ 0x55693aeb0c60] probing stream 0 pp:16
[wav @ 0x55693aeb0c60] probing stream 0 pp:15
[wav @ 0x55693aeb0c60] probing stream 0 pp:14
[wav @ 0x55693aeb0c60] probing stream 0 pp:13
[wav @ 0x55693aeb0c60] probing stream 0 pp:12
[wav @ 0x55693aeb0c60] probing stream 0 pp:11
[wav @ 0x55693aeb0c60] probing stream 0 pp:10
[wav @ 0x55693aeb0c60] probing stream 0 pp:9
[wav @ 0x55693aeb0c60] probing stream 0 pp:8
[wav @ 0x55693aeb0c60] probing stream 0 pp:7
[wav @ 0x55693aeb0c60] probing stream 0 pp:6
[wav @ 0x55693aeb0c60] probing stream 0 pp:5
[wav @ 0x55693aeb0c60] probing stream 0 pp:4
[wav @ 0x55693aeb0c60] probing stream 0 pp:3
[wav @ 0x55693aeb0c60] probing stream 0 pp:2
[wav @ 0x55693aeb0c60] probing stream 0 pp:1
[wav @ 0x55693aeb0c60] probed stream 0
[wav @ 0x55693aeb0c60] parser not found for codec pcm_s16le, packets or times may be invalid.
[wav @ 0x55693aeb0c60] All info found
[wav @ 0x55693aeb0c60] stream 0: start_time: -209146758205323.719 duration: -209146758205323.719
[wav @ 0x55693aeb0c60] format: start_time: -9223372036854.775 duration: -9223372036854.775 bitrate=705 kb/s
[wav @ 0x55693aeb0c60] After avformat_find_stream_info() pos: 204844 bytes read:272384 seeks:3 frames:50
Input stats
* Sample rate: 44100Hz
* Channels: 1
* Format: "s16"
Creating resampler
[SWR @ 0x55693aef0ba0] Using s16p internally between filters
Start sample reading
Flush decoder and read last bits
Write output.wav
* Sample rate: 8000Hz
* Channels: 1
* Format: s16
I think I figured out the issue. resampler.delay().is_some()
is being used to detect when all of the resampler
's buffers have been processed. However, resampler.delay()
will return None
prematurely because the delay is being requested in seconds and then rounding down to zero.
I tried forking rust-ffmpeg to prevent rounding down, but this didn't completely fix the problem because swr_get_delay
will report 16 samples even when all of the output has been flushed. I think this is because the default filter looks at +/- 16 samples.
I found 2 workarounds
- compute the expected number of samples for how much audio has been decoded so far using
av_rescale_rnd(num_decoded_samples, output_rate, input_rate, AV_ROUND_DOWN)
(rounding up seemed to work in this case as well but I wanted to be conservative). run the flush loop until you have all expected samples. - run the
resampler.flush
loop untildata.len()
doesn't change. (this seems simpler)
Tangentially, it might make sense to fix resampler.delay()
to return whenever delay is non-zero (happy to make a PR for that if desired @meh ) instead of rounding to nearest second. However that change would make this example infinite-loop, and I'm not sure if we feel that changing this behavior would be considered a breaking change for users of rust-ffmpeg.
I like workaround number 2, and I think I like delay returning a more meaningful value.
until
data.len()
doesn't change.
So there's no chance of the resampler requiring >1 flush call and two adjacent ones returning the same number of samples?
By 'data' I'm referring to the final vec of samples. If flush returns no samples, it's length wont change.
As long as the audio frame being passed to flush is non-empty, flush returning no samples should indicate there is no more buffered output.