/ai-interact

A new way to communicate with LLM by sharing a portion of your screen instead of typing.

Primary LanguagePythonApache License 2.0Apache-2.0

aihub project intends to change the way how do we interact with LLMs. Today many tools offer integration to all different models, and various chat applications are available online and locally. This provides a very scattered picture for end users and applications without AI integration require extra effort to get help with. aihub offers a more natural way to interface with generative AI models that are app agnostic, by sharing a screen portion with the model where the user seeks help.

How it works

A small Python application with a minimal GUI runs in the background. The application is API-integrated with an LLM of your choice (in our tests we've used LMStudio for local inference, but any other tool would work that implements the OpenAI API) and running a keyboard listener. With the [SHIFT][F1] keyboard shortcut the user initiates the capture mode. By defining an imaginary rectangle with 2 mouse clicks (define 2 diagonal corners of the rectangle) the code captures an image from anywhere on the screen. Then these images are processed by a locally running text extraction model: Tesseract, and the result text will be sent to the LLM with the preconfigured prefix. We've found that LLMs can handle the not-perfect text extraction of Tesseract.

Examples

Coding issue

demo_problem2 demo_result2

Summarization

demo_summary_problem demo_summary_result

Supported Platforms

  • MacOS

Requirements

  • Python 3.11.7
  • Tesseract
  • Open AI API compatible LLVM service access

Install

  • Install dependencies:
$ brew install pyenv tesseract
$ pyenv install 3.11.7
$ pyenv virtualenv 3.11.7 aihub
$ pyenv activate aihub
$ pip install -r requirements.txt
  • Generate the Protocol Buffer Python stubs:
python -m grpc_tools.protoc -I. --python_out=./aihub --pyi_out=./aihub --grpc_python_out=./aihub aihub.proto
  • Configure LLVM service access:
$ vi aihub/config.json
[Perform necessary edit]
  • Start the app:
$ cd aihub && python -m aihub_bootstrap

Usage

  • Press Shift + F1
  • Click the top left, then the bottom right corner of an error message
  • Read the solution in UI window

Config via GUI

aihub_config

Roadmap

  • Chat context handling
  • Support streaming API
  • Support API key
  • Talk to LLM
  • Bot answer to speech
  • Setup multiple Agents