Whisper failed to recognize Chinese contents
Closed this issue · 1 comments
AspadaX commented
Here is the code that I used to extract subtitles from a video that I recorded. So it uses ffmpeg to extract the audio track first, then send it to whisper-rs for inferencing. However, the extracted subtitle is irrelevant to the video that I recorded. I tried to tweak around the BeamSearch, best_of as well as the initial prompt, but no success.
Am I doing wrong with my code?
use rodio::Source;
use whisper_rs::{WhisperContext, WhisperContextParameters, FullParams, SamplingStrategy};
mod audio_conversion;
fn main() {
// extract an audio track from the video
use std::path::Path;
let videopath = Path::new("./video.mov");
if !videopath.exists() {
println!("video file is not seen.");
} else {
println!("{}", videopath.file_name().unwrap().to_str().unwrap());
}
match audio_conversion::AudioVideoConverter::convert_video_to_audio(
videopath.to_str().unwrap(),
"./extracted_audio.wav"
) {
Ok(_) => println!("audio extraction finished."),
Err(error) => println!("{:?}", error)
};
// getting into whisper
let mut whisper_parameters = FullParams::new(
SamplingStrategy::Greedy { best_of: 0 }
);
whisper_parameters.set_language(Some("zh" as &str));
// whisper_parameters.set_initial_prompt("以下是普通话句子:" as &str);
whisper_parameters.set_print_realtime(true);
let whisper_context = match WhisperContext::new_with_params(
"ggml-large-v2-q5_0.bin",
WhisperContextParameters::default()
) {
Ok(result) => result,
Err(error) => panic!("{}", error)
};
use std::fs::File;
use std::io::BufReader;
use rodio::Decoder;
let audio_track = match File::open("./extracted_audio.wav") {
Ok(result) => BufReader::new(result),
Err(error) => panic!("{}", error)
};
let decoded_audio_track: Vec<i16> = match Decoder::new(audio_track) {
Ok(result) => result
.convert_samples::<i16>()
.map(|sample| sample)
.collect(),
Err(error) => panic!("{}", error)
};
let mut samples: Vec<f32> = vec![0.0f32; decoded_audio_track.len()];
whisper_rs::convert_integer_to_float_audio(&decoded_audio_track, &mut samples)
.expect("sample conversion failed.");
// now we can run the model
let mut state = whisper_context.create_state().expect("failed to create state");
state
.full(whisper_parameters, &&samples[..])
.expect("failed to run model");
// fetch the results
let num_segments = state
.full_n_segments()
.expect("failed to get number of segments");
for i in 0..num_segments {
let segment = state
.full_get_segment_text(i)
.expect("failed to get segment");
let start_timestamp = state
.full_get_segment_t0(i)
.expect("failed to get segment start timestamp");
let end_timestamp = state
.full_get_segment_t1(i)
.expect("failed to get segment end timestamp");
println!("[{} - {}]: {}", start_timestamp, end_timestamp, segment);
}
}
AspadaX commented
Closed the issue. As I figured, the issue is on the model, since the English inference is okay. After I changed the model to a fine tuned version, it works just fine.