Xilinx/video-sdk

Audio drift when using the decoder live to scaler/hls output

scottf-tvw opened this issue · 2 comments

I have been struggling with the use of the mpsoc_vcu_h264 decoder and audio drift using a live input and sending it to the scaler/transcode for HLS output using ffmpeg. When removing the mpsoc_vcu_h264 decoder on the live rtmp input, the AV sync is fine.
drift can be noted immediately in some cases and in others overtime, but it does not take long.

Here is a simple example with a live source you can test with.

ffmpeg -c:v mpsoc_vcu_h264 -f flv -i rtmp://ingress.w1.invintus.com/srcEncoders/247airtestfeed \
-filter_complex "multiscale_xma=outputs=5: \
 out_1_width=1920: out_1_height=1080: out_1_rate=full: \
 out_2_width=1280: out_2_height=720:  out_2_rate=full: \
 out_3_width=848:  out_3_height=480:  out_3_rate=full: \
 out_4_width=640:  out_4_height=360:  out_4_rate=full: \
 out_5_width=288:  out_5_height=160:  out_5_rate=full  \
 [vid1][vid2][vid3][vid4][vid5]; [0:2]aformat=channel_layouts=stereo,aresample=async=1:first_pts=0,asplit=outputs=5[aud1][aud2][aud3][aud4][aud5]" \
 -map "[vid1]" -b:v:0 2M   -minrate:v:0 2M   -maxrate:v:0 2M   -bufsize:v:0 4M   -c:v:0 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
 -map "[vid2]" -b:v:1 1M   -minrate:v:1 1M   -maxrate:v:1 1M   -bufsize:v:1 1M   -c:v:1 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
 -map "[vid3]" -b:v:2 750K -minrate:v:2 750K -maxrate:v:2 750K -bufsize:v:2 750K -c:v:2 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
 -map "[vid4]" -b:v:3 375K -minrate:v:3 375K -maxrate:v:3 375K -bufsize:v:3 375K -c:v:3 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
 -map "[vid5]" -b:v:4 250k -minrate:v:4 250k -maxrate:v:4 250k -bufsize:v:4 250k -c:v:4 mpsoc_vcu_h264 -keyint_min 30 -g 90 -qmin 3 -qmax 51 -qdiff 4 -bf 2 -vsync 1 \
 -map "[aud1]" -c:a:0 aac -map "[aud2]" -c:a:1 aac \
 -map "[aud3]" -c:a:2 aac -map "[aud4]" -c:a:3 aac \
 -map "[aud5]" -c:a:4 aac \
 -var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2 v:3,a:3 v:4,a:4" \
 -f hls -hls_list_size 3 -hls_wrap 5 -hls_time 6 -hls_flags delete_segments -master_pl_name "testSync1.m3u8" \
 -hls_segment_filename  "/var/www/hls/testSync1_%v-%d.ts" "/var/www/hls/testSync1_%v.m3u8"

irregardless to having the the outputs aresample in the output cmd or not, sync drift occurs. the sync is gone, it seems before it gets to the multiscale_xma? no output encoder settings in the ABR ladder streams seem to make any difference. I have also tested on several other rtmp inputs with varying encoder settings and get the same results(not a surprise there).

I was hoping to use the decoder to maintain a standard decoded framerate and gain some quality control of the input stream.
Also I will ask this question here as well. will the decoder pass any other meta in the stream, such as captions, ect.. to the scaler pipeline. I haven't gotten to that point in my testing but it would be a game-stopper to use the decoder if that were an issue. I did not see this information called out in the documentation.

device info:

System Configuration
  OS Name              : Linux
  Release              : 5.11.0-1021-aws
  Version              : #22~20.04.2-Ubuntu SMP Wed Oct 27 21:27:13 UTC 2021
  Machine              : x86_64
  CPU Cores            : 12
  Memory               : 22530 MB
  Distribution         : Ubuntu 20.04.3 LTS
  GLIBC                : 2.31
  Model                : vt1.3xlarge

XRT
  Version              : 2.11.691
  Branch               : 2021.1
  Hash                 : 3e695ed86d15164e36267fb83def6ff2aaecd758
  Hash Date            : 2021-11-18 18:16:26
  XOCL                 : 2.11.691, 3e695ed86d15164e36267fb83def6ff2aaecd758
  XCLMGMT              : unknown, unknown

Devices present
  [0000:00:1f.0] : xilinx_u30_gen3x4_base_2
  [0000:00:1e.0] : xilinx_u30_gen3x4_base_2

I did figure out that if frame rate is set on the decoder and the input source is not the same, of course the result will be audio sync issues. so if a video is passed in with a varied rate there can be issues regardless if the rate is set or not in the decoder. This is something that can handled pre-decode/encode.
you can close this ticket since its really a non-issue however, closed captions are a serious issue to look at, how can all metadata in a stream be passed through the decoder and through the scaler pipeline to the transcoded files. I cant seem to find the magic formula to retain EIA-608/EIA-708 or ID3 data using mpsoc_vcu_h264. forgive my ignorance if its something simple.

Hi,
Thank you for bringing these 2 issues to our attention. Regrading the 1st one, another workaround that you may try is to add vsync crf to your cli, e.g., ffmpeg -c:v mpsoc_vcu_h264 -vsync cfr -f flv -i ... .
Regarding the 2nd issue, I have asked our engineering team for feedback.
Cheers,