using this stuff for a newbie

Question

using this stuff for a newbie

elemich opened this issue 3 months ago · 1 comments

hello, i'm new to speech recognition, vosx and python, but i want to translate speech from a simple video i downloaded from the internet (and later even tts'ing to my language or even speech to speech).
i have tried the listen_in_background function in the example with google engine and it works although i'm not able to obtain my goal (word by word translation)

with your software, the recognize_vosx in the callback keeps giving me the same result "Please download the model etc..." and i have done it and unzipped in vosk/model/it (i'm italian) but I can't get it to work.

so i have python3.12 installed, pyaudio, speech_recognition, and my ide for now is the simple IDLE, can you please give me a simple source to begin with this stuff?

Answer 1 · 2024-03-21T15:14:33.000Z

I am newbie, too. Have you build or download prebuilt a library libvosk.so and put it inside vosk-api/src? You can check in your python idle by importing module. For example, after ">>>" you can put the command from vosk import Model, KaldiRecognizer, SetLogLevel . If you can not see exceptions or errors, the library works. Then you can check https://github.com/alphacep/vosk-api/blob/master/python/example/test_simple.py and edit the file to obtain your goal. In addition, vosk can use ffmpeg to convert a video to the audio format PCM 16khz 16bit mono (you can use test_ffmpeg.py in the folder Although I have tried to learn python syntax, I would use just command line because it is easy. vosk-transcriber -m path-to-the directory-of-module/vosk-model-it-0.22 -t srt -i input.mp3 -o output.text