
Speech-to-Text using ggerganov/whisper.cpp for GitHub Action

Primary LanguageShellMIT LicenseMIT


Speech-to-Text using ggerganov/whisper.cpp for GitHub Action. High-performance inference of OpenAI Whisper automatic speech recognition (ASR) model.

Inputs variables

See action.yml for more detailed information.

Variable Description Default
model public whisper model. (available: small, medium and large) small
audio_path Audio Path.
output_folder output folder.
output_format output format, support txt, srt, csv. txt
output_filename output filename.
debug enable debug mode.
print_progress print progress. true
print_segment print segment.
youtube_url youtube url
translate translate from source language to english. false
cut_silences cut silences. false
prompt initial prompt text.


Donwload Youtube video and transcript it.

    name: transcript english video
    runs-on: ubuntu-latest
    - name: checkout
      uses: actions/checkout@v3

    - name: speech to text
      uses: appleboy/whisper-action@v0.1.1
        model: small
        youtube_url: https://www.youtube.com/watch?v=pTCxXZh6VyE
        output_format: srt
        output_folder: youtube
        print_segment: true
        debug: true

    - name: git push changes
      uses: appleboy/git-push-action@v0.0.2
        branch: main
        commit: true
        commit_message: "[skip ci] Upload changes"
        remote: git@github.com:appleboy/whisper-action.git
        ssh_key: ${{ secrets.DEPLOY_KEY }}
        rebase: true

See the output file in youtube folder.