A class-based script transcribes audio files using OpenAI Whisper. It also writes subtitles to various file formats.
- python 3.8-3.11
- ffmpeg
pip install openai-whisper==20231117 pydub==0.25.1
install CUDA
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
from whisper_script import WhisperTranscriber
my_prompts = {
"verbose": False,
"temperature": 0.2,
"word_timestamps": True
}
transcriber = WhisperTranscriber(audio_file="D:\\Dev\\Python\\WhisperScript\\1min.mp3", model_size="tiny",
prompt=my_prompts,
device="cuda")
get_result = transcriber.transcribe()
print(get_result["text"])
# Write subtitles
my_options = {
'max_line_width': 50,
'max_line_count': 3,
'highlight_words': True
}
transcriber.subtitles_writer("D:\\Dev\\Python\\WhisperScript", "srt", my_options) # available formats: ["txt", "srt", "vtt", "tsv", "json"]
Parameter | Description |
audio_file | (str): The path to the audio file to be transcribed. Defaults to None. |
model_size | (str, optional): The size of the model to be used for transcription. It can be 'base' or other values depending on the available models. Defaults to 'base'. |
download_root | (str, optional): The root directory where the model files will be downloaded. Defaults to None. |
language | (str, optional): The language of the audio file. It can be set to 'auto' for automatic language detection or a specific language code. Defaults to 'auto'. |
task | (str, optional): The task to be performed. In this case, it's set to 'transcribe' for transcribing the audio file. Defaults to 'transcribe'. |
prompt | (dict, optional): A dictionary containing additional settings for the transcription task. Defaults to None. |
device | (str, optional): The device where the model will be loaded. It can be set to 'cpu', ‘cuda’. Defaults to None. |
.transcribe() start transcribing, returns transcribed text
.subtitles_writer(output_dir, output_format, options) write subtitles