google/visqol

MOS-LQO results are low in speech mode

hoantv93 opened this issue · 1 comments

We tried to apply VISQOL in the audio quality evaluation of a security camera device.
Here is our recording process:
Human voice -> Recorded by high-quality microphone (48kHz, 16bit, mono) -> Resample (16kHz, 16bit, mono) -> reference audio (REF.MONO.16KHZ.VOICE.01.wav)
Human voice -> Recorded by camera's microphone -> Resample (16kHz, 16bit, mono) -> degraded audio (DEG.MONO.16KHZ.VOICE.01.wav)
VISQOL command:
visqol --reference_file REF.MONO.16KHZ.VOICE.01.wav --degraded_file DEG.MONO.16KHZ.VOICE.01.wav --verbose --use_speech_mode
Return MOS is 1.64007 (lower than our expected)
But, MOS is 3.41819 when used in audio mode.

Our test method is ok or not? What we need to do to improve MOS results in speech mode?
Audio files