LLM inference GUI for Jupyter notebook


This is an LLM-powered chat interface integrated with voice synthesis model, web search-based RAG, and python environment.
Everything is running locally in your computer, so it doesn't require any API keys.
For better results, I strongly recommend you to select a model large enough or trained for tool use.

Features

1. Streaming output.
2. Context shifting manipulating KV-Cache significantly reducing evaluation time (called StreamingLLM).
3. Automatic tool use (aka. function calling) involving "Web Search", "Python Interpreter".
4. Integration to Japanese text-to-speech model called style-bert-vits2.
5. Image recognition utilizing Image-to-Text model.

Installation/Prerequisites

1. Jupyter must be installed.
$ pip install jupyterlab

2. Activate ipywidgets by one of following commands:
    2-1. jupyter-lab
    $ jupyter labextension install @jupyter-widgets/jupyterlab-manager
    2-2. jupyter notebook / Google Colab
    $ jupyter nbextension enable --py widgetsnbextension

3. Install dependencies.
$ pip install -r requirements.txt

4. (Optional) You might want to utilize hardware acceleration (cuda/mps).
In that case, you need to reinstall llama-cpp-python in the way corresponding to your device.
Installation instruction can be found here llama-cpp-python

5. (Optional) You might want to unlock websocket message size limit to send large files or to play longer sounds (the limit is 10MB by default).
$ jupyter notebook --generate-config
After running the command, edit following line of the config file:
c.ServerApp.tornado_settings = {"websocket_max_message_size":100*1024*1024}

Screen Shots