/WhisperWithGoogleTrans

Whisper Ai's result auto translate

Primary LanguagePython

WhisperWithGoogleTrans

I added a function to request the translation of SRT files that come out after converting voice to text using Whisper AI.

Requirement

  • Python 3.8-3.10
  • ffmpeg
  • pysrt
  • googletrans

You can download and install (or update to) the latest release of Whisper with the following command:

pip install -U openai-whisper

Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

pip install git+https://github.com/openai/whisper.git 

To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

To install the package to the pysrt please run:

pip install pysrt

To install the package to the googletrans please run:

pip install googletrans==4.0.0rc1

You may need rust installed as well, in case tiktoken does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH". If the installation fails with No module named 'setuptools_rust', you need to install setuptools_rust, e.g. by running:

pip install setuptools-rust

Command-line usage

change directory:

cd whispertranslate

The following command will transcribe speech in audio files, using the medium model:

python cli.py audio.flac audio.mp3 audio.wav --model medium

The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

python cli.py japanese.wav --language Japanese

See tokenizer.py for the list of all available languages.

Adding --translang language will translate the speech into language:

python cli.py japanese.wav --language Japanese --translang ko

See googletrans for the list of all available translate languages.

You can use --dualsrt [Y/N] to create subtitles for original and translated words at the same time. (For Y), if you select the option of N, a separate srt file of the original and the srt file of the translation will be created.

python cli.py japanese.wav --language Japanese --translang ko --dualsrt Y

Run the following to view all available options:

python cli.py --help

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.