aihub project intends to change the way how do we interact with LLMs. Today many tools offer integration to all different models, and various chat applications are available online and locally. This provides a very scattered picture for end users and applications without AI integration require extra effort to get help with. aihub offers a more natural way to interface with generative AI models that are app agnostic, by sharing a screen portion with the model where the user seeks help.
A small Python application with a minimal GUI runs in the background. The application is API-integrated with an LLM of your choice (in our tests we've used LMStudio for local inference, but any other tool would work that implements the OpenAI API) and running a keyboard listener. With the [SHIFT][F1] keyboard shortcut the user initiates the capture mode. By defining an imaginary rectangle with 2 mouse clicks (define 2 diagonal corners of the rectangle) the code captures an image from anywhere on the screen. Then these images are processed by a locally running text extraction model: Tesseract, and the result text will be sent to the LLM with the preconfigured prefix. We've found that LLMs can handle the not-perfect text extraction of Tesseract.
- MacOS
- Python 3.11.7
- Tesseract
- Open AI API compatible LLVM service access
- Install dependencies:
$ brew install pyenv tesseract
-
Setup
pyenv
: Please follow the instructions described at https://github.com/pyenv/pyenv?tab=readme-ov-file#set-up-your-shell-environment-for-pyenv -
Install Python environment:
$ pyenv install 3.11.7
$ pyenv virtualenv 3.11.7 aihub
$ pyenv activate aihub
$ pip install -r requirements.txt
- Generate the Protocol Buffer Python stubs:
python -m grpc_tools.protoc -I. --python_out=./aihub --pyi_out=./aihub --grpc_python_out=./aihub aihub.proto
- Configure LLVM service access:
$ vi aihub/config.json
[Perform necessary edit]
- Start the app:
$ cd aihub && python -m aihub_bootstrap
- Press
Shift + F1
- Click the top left, then the bottom right corner of an error message
- Read the solution in UI window
- Chat context handling
- Support streaming API
- Support API key
- Talk to LLM
- Bot answer to speech
- Setup multiple Agents