Try Inference with SALMONN's Q former

conda activate videollama_llx
python demo_audio.py --gpu-id "your gpu id"