/openlrc

Transcribe (whisper) and translate (gpt) voice into LRC file. 使用whisper和gpt来转录、翻译你的音频为字幕文件。

Primary LanguagePythonMIT LicenseMIT

Open-Lyrics

PyPI PyPI - License Downloads GitHub Workflow Status (with event)

Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using OpenAI-GPT.

Installation

  1. Please install CUDA and cuDNN first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

  2. Add your OpenAI API key to environment variable OPENAI_API_KEY.

  3. Install PyTorch:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  4. Install latest fast-whisper

    pip install git+https://github.com/guillaumekln/faster-whisper
  5. (Optional) If you want to process videos, install ffmpeg and add bin directory to your PATH.

  6. This project can be installed from PyPI:

    pip install openlrc

    or install directly from GitHub:

    pip install git+https://github.com/zh-plus/Open-Lyrics

Usage

from openlrc import LRCer

lrcer = LRCer()

# Single file
lrcer.run('./data/test.mp3', target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

# Multiple files
lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
# Note we run the transcription sequentially, but run the translation concurrently for each file.

# Path can contain video
lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
# Generate translated ./data/test_audio.lrc and ./data/test_video.srt

# Use context.yaml to improve translation
lrcer.run('./data/test.mp3', target_lang='zh-cn', context_path='./data/context.yaml')

# To skip translation process
lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

# Change asr_options or vad_options, check openlrc.defaults for details
vad_options = {"threshold": 0.1}
lrcer = LRCer(vad_options=vad_options)
lrcer.run('./data/test.mp3', target_lang='zh-cn')

# Enhance the audio using noise suppression (consume more time).
lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

Check more details in Documentation.

Context

Utilize the available context to enhance the quality of your translation. Save them as context.yaml in the same directory as your audio file.

background: "This is a multi-line background.
This is a basic example."
audio_type: Movie
description_map: {
  movie_name1 (without extension): "This
  is a multi-line description for movie1.",
  movie_name2 (without extension): "This
  is a multi-line description for movie2.",
  movie_name3 (without extension): "This is a single-line description for movie 3.",
}

Todo

  • [Efficiency] Batched translate/polish for GPT request (enable contextual ability).
  • [Efficiency] Concurrent support for GPT request.
  • [Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.
  • [Feature] Automatically fix json encoder error using GPT.
  • [Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.
  • [Quality] Improve batched translation/polish prompt according to gpt-subtrans.
  • [Feature] Input video support.
  • [Feature] Multiple output format support.
  • [Quality] Speech enhancement for input audio.
  • [Feature] Align ground-truth transcription with audio.
  • [Quality] Use multilingual language model to assess translation quality.
  • [Efficiency] Add Azure OpenAI Service support.
  • [Quality] Use claude for translation.
  • [Feature] Add local LLM support.
  • [Feature] Multiple translate engine (Microsoft, DeepL, Google, etc.) support.
  • [Feature] Build a electron + fastapi GUI for cross-platform application.
  • Add fine-tuned whisper-large-v2 models for common languages.
  • [Others] Add transcribed examples.
    • Song
    • Podcast
    • Audiobook

Credits

Star History

Star History Chart