mkiol/dsnote

Speech Note 4.7.0 Beta 2

Opened this issue · 0 comments

mkiol commented

If you want to test the upcoming release, Speech Note 4.7.0 Beta 2 is available in "flathub-beta" repository.

This version is perfectly usable, but may contain more bugs.

To enable "flathub-beta" in your system follow this instruction or simply do the following:

flatpak remote-add --if-not-exists flathub-beta https://flathub.org/beta-repo/flathub-beta.flatpakrepo

Release Highlights

  • Vulkan GPU acceleration for WhisperCpp

Changes between 4.6.1 and 4.7.0 Beta 2

  • User Interface:
    • Speech Note has been translated into Slovenian language.
    • Inserting text at the cursor position or replacing the current note. To insert text at the cursor position rather than at the end of the note, change Text appending mode option to Add at the cursor position in the settings. When the Replace an existing note option is set, whenever new text is added, it will replace the existing note.
    • Status indication in the system tray icon. When using the system tray icon, statuses such as processing, listening, etc. are presented with an animated tray icon.
    • Models grouped by type in model browser. To improve usability, instead of a list containing models of all types, models are grouped by type in separate tabs.
  • Speech to Text:
    • Support for Vulkan GPU acceleration in WhisperCpp. Vulkan acceleration enables much faster STT decoding with Intel, AMD or NVIDIA graphics cards. With Vulkan, decoding is quicker than with OpenVINO, OpenCL and ROCm, but still may be slightly slower compared to CUDA. The biggest advantage of Vulkan is that you can use it without installing any GPU acceleration add-ons. Vulkan is not enabled by default for integrated GPU. To test it on iGPU, enable Use Vulkan iGPU in the settings (Other->Hardware acceleration options).
    • New Whisper Large Turbo model for both WhisperCpp and FasterWhisper. Turbo is a finetuned version of a pruned Whisper Large-v3. It's the exact same model, except that the number of decoding layers have reduced. As a result, the model is way faster, at the expense of a minor quality degradation. Turbo model does not have the ability to translate into English, as does the regular Large model.
    • Simplified engine configuration options. Instead of multiple options, you can now select a Profile, which allows you to change the engine's processing parameters. There are three profiles to choose from: Best Performance, Best Quality and Custom.
  • Text to Speech:
    • New Piper voice for Latvian
  • Translator:
    • New models: English to Finnish, English to Turkish, English to Swedish, Swedish to English, English to Slovak, English to Indonesian, English to Romanian, English to Greek, Chinese to English
    • Updated models: English to Catalan, English to Russian, English to Ukrainian, English to Czech
  • Accessibility:
    • Option to scan special key strokes when setting keyboard shortcuts (X11 only). If you want to use special keys as shortcuts (so-called "multimedia keys"), instead of typing their names, you can automatically set the key by pressing it.
    • Keyboard shortcuts enabled for several user interface elements. Elements such as menu items or buttons can be controlled using the keyboard shortcuts. Examples: Switch to Notepad (Ctrl+N), Switch to Translator (Ctrl+T), Open Languages (Ctrl+L), Read (Ctrl+Alt+Shift+R), Listen (Ctrl+Alt+Shift+L), Stop (Ctrl+Alt+Shift+S), Cancel (Ctrl+Alt+Shift+C), Pause (Ctrl+Alt+Shift+P) and more...
    • New Actions and global keyboard shotcut to force translation of text in STT: start-listening-translate, start-listening-translate-active-window, start-listening-translate-clipboard. The decoded text is always translated into English when the "translate" action is triggered. This only works when using Whisper models.
  • Flatpak:
    • whisper.cpp update to version 1.7.1
    • PyTorch update to version 2.5.1
    • ROCm update to version 6.2.2 (AMD add-on)
    • cuDNN update to version 9.5.1 (NVIDIA add-on)

Make sure to update the add-on to version 1.3.0 if you are using it.