dynamiccreator/whisper-typer-tool

It is not detecting F2 on Macos

Opened this issue · 3 comments

❯ python3 whisper-typer-tool.py
loading model...
tiny model loaded
ready - start transcribing with F2 ...

This process is not trusted! Input event monitoring will not be possible until it is added to accessibility clients.
^[OQException in thread Thread-3:
Traceback (most recent call last):
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "whisper-typer-tool.py", line 76, in record_speech
stream = p.open(format=sample_format,
File "/Users/sp/Desktop/my_project/AI_Research/whisper-typer-tool/whisvenv/lib/python3.8/site-packages/pyaudio/init.py", line 639, in open
stream = PyAudio.Stream(self, *args, **kwargs)
File "/Users/sp/Desktop/my_project/AI_Research/whisper-typer-tool/whisvenv/lib/python3.8/site-packages/pyaudio/init.py", line 441, in init
self._stream = pa.open(**arguments)
OSError: [Errno -9998] Invalid number of channels

I'm having the same here @dynamiccreator Can you please help with it? Thank you very much.

This error - "This process is not trusted! Input event monitoring will not be possible until it is added to accessibility clients." - can be fixed by going to System Settings -> Privacy & Security -> Accessibility and enabling Terminal to control your computer.

The other error needs a code change; instead of hard-coding the number of channels to 2, the code should fetch the number of channels for the active device and pass that to the open() function:

channels = p.get_default_input_device_info()["maxInputChannels"]

I have it working on my mac

I did:
brew install ffmpeg portaudio

Changed the start and stop key to:

COMBINATIONS = [
    {
        "keys": [
            #{keyboard.Key.ctrl ,keyboard.Key.shift, keyboard.KeyCode(char="r")},
            #{keyboard.Key.ctrl ,keyboard.Key.shift, keyboard.KeyCode(char="R")},
            #{keyboard.Key.f2},
            {keyboard.Key.ctrl, keyboard.KeyCode(char="r")},
            {keyboard.Key.ctrl, keyboard.KeyCode(char="R")},
        ],
        "command": "start record",
    },
]

And changed those lines of code:

#record audio
def record_speech():
    global file_ready_counter
    global stop_recording
    global is_recording

    is_recording=True
    chunk = 1024  # Record in chunks of 1024 samples
    chunk = chunk * 16 #need to be more.. becauses of overflow
    sample_format = pyaudio.paInt16  # 16 bits per sample
    #channels = 2
    fs = 44100  # Record at 44100 samples per second
    p = pyaudio.PyAudio()  # Create an interface to PortAudio
    channels = p.get_default_input_device_info()["maxInputChannels"]
    stream = p.open(format=sample_format,
                channels=channels,
                rate=fs,
                frames_per_buffer=chunk,
                input=True)

    frames = []  # Initialize array to store frames

    print("Start recording...\n")
    playsound("on.wav")