Tabby is a self-hosted OSS AI coding assistant
I use it every day and you should too
Please support the authors https://github.com/TabbyML/tabby
The problem I have with it is the complexity of its backend
So I reverse engineered it in python (without reading the codebase, some bugs are likely)
Clone this repo and install 2 dependencies:
git clone https://github.com/vsolina/tabby-backend-py.git
cd tabby-backend-py
# optionally create venv:
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
Download the model you want to use, e.g. 🤗:
wget https://huggingface.co/TabbyML/StarCoder-1B/resolve/main/ggml/q8_0.v2.gguf
Update model path in app.py (literally called model_path
in model_config
)
Use relative or absolute path to the model (no shell expansion, e.g.: ~/)
Run the service:
python app.py
# OR if you want to customize the port, host
flask run --port 8080 --host 0.0.0.0
Install Tabby Extension and update server URL in the Extension configuration:
https://tabby.tabbyml.com/docs/extensions/
Customization:
- All configuration is defined at the beginning of app.py
- Model used must support FIM (StableCode, StarCoder, CodeLlama, etc.)
- You can provide more params to Llama class by extending
model_config
- Same for generator config
- Available options here https://llama-cpp-python.readthedocs.io/en/latest/api-reference/
Notes:
- Some bugs are
possible(ehm)likely, use at your own risk (or fix and open a PR) - Missing language customizations (Stop Words, Repository Context), but completion still works well
- Endpoint /events is not really implemented (mostly not essential)
- q5_K_M quantized model versions are recommended for best performance vs quality loss ratio
- If you have specific HW acceleration considerations install llama-cpp-python this way:
https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#installation-with-specific-hardware-acceleration-blas-cuda-metal-etc - This guy is awesome https://huggingface.co/TheBloke take a look at his quantized model selection
Long live the king Georgi https://github.com/ggerganov
We must fight the complexity demon because it wins by default
I believe software should be anticomplex
Original TabbyML server implementation (v0.7.0)
has 530 dependencies (https://github.com/TabbyML/tabby/blob/main/Cargo.lock)
has 7178 lines of Rust code in 73 files across 9 crates
uses 3 languages (Rust, Python, C++) according to github
This implementation has
2 (first level) dependencies
20 lines of code in one file (app-min.py), 671 bytes
simple is better than complex is better than complicated
When I can read it
I can understand it
I can fix it
That makes it antifragile and easy to maintain
Rant over, have a great day