/anki-asr

Anki add-on for speech recognition

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

Anki add-on for speech recognition.

Supported speech-to-text services

The only supported service at the moment is Deepgram. I plan to add support for Google Speech-to-Text and maybe Whisper in the future.

Usage

Most speech recognition services require you to register for an API key. After you sign up and get your key, you need to paste it in the add-on's config. Go to Tools > Add-ons, select this add-on from your add-on list, and click Config. Then paste your key in the api_key option under the relevant service name under the provider_options option.

As a template filter

Currently, the add-on works as a template filter (asr, for "automatic speech recognition" or "Anki speech recognition"), which you put in your card template. E.g:

{{asr:Front}}

The add-on processes any [sound:foo.mp3] tags in the specified field and replaces them with the transcriptions of the audio.

You can specify the language using the lang option. E.g:

{{asr lang=tr:Front}}

The default language is English (en). Supported languages depend on the service used. For Deepgram, see https://deepgram.com/product/languages/ for a list of supported languages.

The speech-to-text service used can be specified using the provider option. E.g:

{{asr provider=deepgram:Front}}

If you set auto=false, a button will be shown that you can click to show the transcription:

{{asr auto=false:Front}}

This is useful to avoid making a request to the ASR service when not needed, or to simply use the transcription as an optional hint.

You can see a list of each provider's supported languages by placing something like the following on your template:

{{asr-langs provider=deepgram:}}

This will list each supported language's code and name. The language code is what you have to provide to the lang option.

Download

You can download the add-on from AnkiWeb: https://ankiweb.net/shared/info/411601849

Planned features

Besides adding support for more services, I plan to add an option to fill in note a chosen field with the transcription, maybe support for bulk processing.