Okra

Some love it, some hate it, others don't even know that it exists, just like the real-life okra!

Okra is your all-in-one personal AI assistant. This is my effort at recreating something similar to ChatGPT's desktop application. Even though it has a LOT of room for improvement, it's still pretty fun to play with.

Features

Speech recognition: Okra listens to you in the background and recognizes your speech, using the power of the well-known SpeechRecognition library.
Speech-to-text conversion: Okra uses external speech-to-text APIs to transcribe your speech. The currently supported speech-to-text providers are:
- Deepgram
- Groq
- OpenAI
Vision capabilities: You can share your webcam feed or your computer screen with okra, and it will use the image to chat with you and answer your questions!
Multiple LLM support: Okra supports multiple LLM and VLM API providers. The currently available providers are:
Text-to-speech capability: Okra can speak to you, using various text-to-speech models. Currently, it supports:
- Deepgram
- OpenAI

Installation

To install, do the following:

Clone the repository:

git clone https://github.com/S4mpl3r/okra.git

Create a python environment and activate it. (optional, but highly recommended)

# Windows
python -m venv .venv
.venv/Scripts/activate
# Linux
python3 -m venv .venv
source .venv/bin/activate

Create a .env file in the project root and populate it with your API keys according to the .env.example file provided:
```
DEEPGRAM_API_KEY=
GROQ_API_KEY=
GOOGLE_API_KEY=
OPENAI_API_KEY=
```

Install the required packages

# Windows
python -m pip install -r requirements.txt
# Linux
python3 -m pip install -r requirements.txt

Edit the okra/config.py file to your liking. The default configuration uses gemini-1.5-flash as the llm, groq as the speech-to-text provider, and deepgram as the text-to-speech provider:

# okra/config.py
config: GlobalConfig = {
     # Make this False if you don't want to use vision,
     # or the model that you use does not support it
     "use_vision": True,
     # Make this False if you don't want okra to generate speech
     "talk": True,
     # The source of vision, can be either 'screen' (your computer screen) or 'webcam' 
     "image_source": "screen",
     # The llm to use
     "llm": Gemini(
         model_name="models/gemini-1.5-flash-latest",
         system_prompt=system_prompt,
         max_history_length=10,
     ),
     # The speech-to-text model to use
     "speech_to_text": GroqSpeechToText(),
     # The text-to-speech model to use
     "text_to_speech": DeepgramTextToSpeech(),
 }

Run the tool:

# Windows
python okra.py
# Linux
python3 okra.py

Options

You can edit the okra/config.py file to change the behavior of okra to your liking. You have access to:

3 LLM classes found in okra.llm subpackage:
- Gemini
- GPT
- GroqLLM
3 speech-to-text classes found in okra.speech subpackage:
- DeepgramSpeechToText
- OpenAISpeechToText
- GroqSpeechToText
2 text-to-speech classes found in okra.speech subpackage:
- DeepgramTextToSpeech
- OpenAITextToSpeech

Example config 1

LLM: OpenAI
Speech-to-text: Deepgram
Text-to-speech: Deepgram
Vision source: screen

# okra/config.py
config: GlobalConfig = {
    "use_vision": True,
    "talk": True,
    "image_source": "screen",
    "llm": GPT(
        model_name="gpt-4o",
        system_prompt=system_prompt,
        max_history_length=10,
    ),
    "speech_to_text": DeepgramSpeechToText(),
    "text_to_speech": DeepgramTextToSpeech(),
}

Example config 2

LLM: Groq
Speech-to-text: Deepgram
Text-to-speech: OpenAI
No vision

# okra/config.py
config: GlobalConfig = {
    "use_vision": False, # Groq does not support vision models (yet)
    "talk": True,
    "image_source": "screen",
    "llm": GroqLLM(
        model_name="llama3-70b-8192",
        system_prompt=system_prompt,
        max_history_length=10,
    ),
    "speech_to_text": DeepgramSpeechToText(),
    "text_to_speech": OpenAITextToSpeech(),
}

Usage

If you run python okra.py -h, you'll get:

usage: python okra.py [options]

Okra is your all in one desktop AI voice assistant.

options:
  -h, --help    show this help message and exit
  --skip-intro  skip intro
  --no-music    do not play intro music

By default, okra will play an intro cutscene and music ¹ (just for fun, lol). If you want to skip this intro, run python okra.py --skip-intro. If you just want to mute the music, run python okra.py --no-music.

To exit the assistant, type 'q' and press enter in the terminal.

Have fun!

License

MIT

The intro music was created with Suno ↩

S4mpl3r/okra