Audio drift when using the decoder live to scaler/hls output
scottf-tvw opened this issue · 2 comments
I have been struggling with the use of the mpsoc_vcu_h264 decoder and audio drift using a live input and sending it to the scaler/transcode for HLS output using ffmpeg. When removing the mpsoc_vcu_h264 decoder on the live rtmp input, the AV sync is fine.
drift can be noted immediately in some cases and in others overtime, but it does not take long.
Here is a simple example with a live source you can test with.
ffmpeg -c:v mpsoc_vcu_h264 -f flv -i rtmp://ingress.w1.invintus.com/srcEncoders/247airtestfeed \
-filter_complex "multiscale_xma=outputs=5: \
out_1_width=1920: out_1_height=1080: out_1_rate=full: \
out_2_width=1280: out_2_height=720: out_2_rate=full: \
out_3_width=848: out_3_height=480: out_3_rate=full: \
out_4_width=640: out_4_height=360: out_4_rate=full: \
out_5_width=288: out_5_height=160: out_5_rate=full \
[vid1][vid2][vid3][vid4][vid5]; [0:2]aformat=channel_layouts=stereo,aresample=async=1:first_pts=0,asplit=outputs=5[aud1][aud2][aud3][aud4][aud5]" \
-map "[vid1]" -b:v:0 2M -minrate:v:0 2M -maxrate:v:0 2M -bufsize:v:0 4M -c:v:0 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
-map "[vid2]" -b:v:1 1M -minrate:v:1 1M -maxrate:v:1 1M -bufsize:v:1 1M -c:v:1 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
-map "[vid3]" -b:v:2 750K -minrate:v:2 750K -maxrate:v:2 750K -bufsize:v:2 750K -c:v:2 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
-map "[vid4]" -b:v:3 375K -minrate:v:3 375K -maxrate:v:3 375K -bufsize:v:3 375K -c:v:3 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
-map "[vid5]" -b:v:4 250k -minrate:v:4 250k -maxrate:v:4 250k -bufsize:v:4 250k -c:v:4 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
-map "[aud1]" -c:a:0 aac -map "[aud2]" -c:a:1 aac \
-map "[aud3]" -c:a:2 aac -map "[aud4]" -c:a:3 aac \
-map "[aud5]" -c:a:4 aac \
-var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2 v:3,a:3 v:4,a:4" \
-f hls -hls_list_size 3 -hls_wrap 5 -hls_time 6 -hls_flags delete_segments -master_pl_name "testSync1.m3u8" \
-hls_segment_filename "/var/www/hls/testSync1_%v-%d.ts" "/var/www/hls/testSync1_%v.m3u8"
irregardless to having the the outputs aresample
in the output cmd or not, sync drift occurs. the sync is gone, it seems before it gets to the multiscale_xma? no output encoder settings in the ABR ladder streams seem to make any difference. I have also tested on several other rtmp inputs with varying encoder settings and get the same results(not a surprise there).
I was hoping to use the decoder to maintain a standard decoded framerate and gain some quality control of the input stream.
Also I will ask this question here as well. will the decoder pass any other meta in the stream, such as captions, ect.. to the scaler pipeline. I haven't gotten to that point in my testing but it would be a game-stopper to use the decoder if that were an issue. I did not see this information called out in the documentation.
device info:
System Configuration
OS Name : Linux
Release : 5.11.0-1021-aws
Version : #22~20.04.2-Ubuntu SMP Wed Oct 27 21:27:13 UTC 2021
Machine : x86_64
CPU Cores : 12
Memory : 22530 MB
Distribution : Ubuntu 20.04.3 LTS
GLIBC : 2.31
Model : vt1.3xlarge
XRT
Version : 2.11.691
Branch : 2021.1
Hash : 3e695ed86d15164e36267fb83def6ff2aaecd758
Hash Date : 2021-11-18 18:16:26
XOCL : 2.11.691, 3e695ed86d15164e36267fb83def6ff2aaecd758
XCLMGMT : unknown, unknown
Devices present
[0000:00:1f.0] : xilinx_u30_gen3x4_base_2
[0000:00:1e.0] : xilinx_u30_gen3x4_base_2
I did figure out that if frame rate is set on the decoder and the input source is not the same, of course the result will be audio sync issues. so if a video is passed in with a varied rate there can be issues regardless if the rate is set or not in the decoder. This is something that can handled pre-decode/encode.
you can close this ticket since its really a non-issue however, closed captions are a serious issue to look at, how can all metadata in a stream be passed through the decoder and through the scaler pipeline to the transcoded files. I cant seem to find the magic formula to retain EIA-608/EIA-708 or ID3 data using mpsoc_vcu_h264. forgive my ignorance if its something simple.
Hi,
Thank you for bringing these 2 issues to our attention. Regrading the 1st one, another workaround that you may try is to add vsync crf to your cli, e.g., ffmpeg -c:v mpsoc_vcu_h264 -vsync cfr -f flv -i ... .
Regarding the 2nd issue, I have asked our engineering team for feedback.
Cheers,