
Use self-hosted LLMs with an OpenAI compatible API

Primary LanguageTypeScriptMIT LicenseMIT


LLMatic Logo

Use self-hosted LLMs with an OpenAI compatible API

llmatic llmatic test and release

LLMatic can be used as a drop-in replacement for OpenAI's API (see the supported endpoints). It uses llama-node with llama.cpp backend to run the models locally.

Supported endpoints:

  • /completions (stream and non-stream)
  • /chat/completions (stream and non-stream)
  • /embeddings
  • /models

This project is currently a work in progress. At this point, it's recommended to use it only for ad-hoc development and testing.


The main motivation behind making LLMatic was to experiment with OpenAI's API without worrying about the cost. I have seen other attempts at creating OpenAI-Compatible APIs such as:

  1. FastChat
  2. GPT4All Chat Server Mode
  3. simpleAI

But I wanted a small, simple, and easy to extend implementation in TypeScript based on the official OpenAI API specification.

How to use

If you prefer a video tutorial, you can watch the following video for step-by-step instructions on how to use this project:



  • Node.js >=18.16
  • Unix-based OS (Linux, macOS, WSL, etc.)


Create an empty directory and run npm init:

export LLMATIC_PROJECT_DIR=my-llmatic-project
npm init -y

Install and configure LLMatic:

npm add llmatic
# Download a model and generate a config file
npx llmatic config

Adjust the config file to your needs and start the server:

npx llmatic start

You can run llmatic --help to see all available commands.

Usage with chatbot-ui

Clone the repo and install the dependencies:

git clone https://github.com/mckaywrigley/chatbot-ui.git
cd chatbot-ui
npm install

Create a .env.local file:

cat <<EOF > .env.local
# For now, this is ignored by LLMatic

NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPT=A chat between a curious human (user) and an artificial intelligence assistant (assistant). The assistant gives helpful, detailed, and polite answers to the human's questions.

user: Hello!
assistant: Hello! How may I help you today?
user: Please tell me the largest city in Europe.
assistant: Sure. The largest city in Europe is Moscow, the capital of Russia.



Run the server:

npm run dev -- --port 3001


chatbot-ui Demo

Usage with LangChain

There are two examples of using LLMatic with LangChain in the examples directory.

To run the Node.js example, first install the dependencies:

cd examples/node-langchain
npm install

Then run the main script:

npm start
Expand this to see the sample output
[chain/start] [1:chain:llm_chain] Entering Chain run with input: {
  "humanInput": "Rememeber that this is a demo of LLMatic with LangChain.",
  "history": ""
[llm/start] [1:chain:llm_chain > 2:llm:openai] Entering LLM run with input: {
  "prompts": [
    "A chat between a curious user and an artificial intelligence assistant.\nThe assistant gives helpful, detailed, and polite answers to the user's questions.\n\n\nHuman: Rememeber that this is a demo of LLMatic with LangChain.\nAI:"
[llm/end] [1:chain:llm_chain > 2:llm:openai] [5.92s] Exiting LLM run with output: {
  "generations": [
        "text": " Yes, I understand. I am ready to assist you with your queries.",
        "generationInfo": {
          "finishReason": "stop",
          "logprobs": null
  "llmOutput": {
    "tokenUsage": {}
[chain/end] [1:chain:llm_chain] [5.92s] Exiting Chain run with output: {
  "text": " Yes, I understand. I am ready to assist you with your queries."
[chain/start] [1:chain:llm_chain] Entering Chain run with input: {
  "humanInput": "What did I ask you to remember?",
  "history": "Human: Rememeber that this is a demo of LLMatic with LangChain.\nAI:  Yes, I understand. I am ready to assist you with your queries."
[llm/start] [1:chain:llm_chain > 2:llm:openai] Entering LLM run with input: {
  "prompts": [
    "A chat between a curious user and an artificial intelligence assistant.\nThe assistant gives helpful, detailed, and polite answers to the user's questions.\n\nHuman: Rememeber that this is a demo of LLMatic with LangChain.\nAI:  Yes, I understand. I am ready to assist you with your queries.\nHuman: What did I ask you to remember?\nAI:"
[llm/end] [1:chain:llm_chain > 2:llm:openai] [6.51s] Exiting LLM run with output: {
  "generations": [
        "text": " You asked me to remember that this is a demo of LLMatic with LangChain.",
        "generationInfo": {
          "finishReason": "stop",
          "logprobs": null
  "llmOutput": {
    "tokenUsage": {}
[chain/end] [1:chain:llm_chain] [6.51s] Exiting Chain run with output: {
  "text": " You asked me to remember that this is a demo of LLMatic with LangChain."

To run the Python example, first install the dependencies:

cd examples/python-langchain
pip3 install -r requirements.txt

Then run the main script:

python3 main.py
Expand this to see the sample output
> Entering new LLMChain chain...
Prompt after formatting:
A chat between a curious user and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the user's questions.

Human: Rememeber that this is a demo of LLMatic with LangChain.

> Finished chain.
 Yes, I understand. I am ready to assist you with your queries.

> Entering new LLMChain chain...
Prompt after formatting:
A chat between a curious user and an artificial intelligence assistant.
The assistant gives helpful, detailed, and polite answers to the user's questions.

Human: Rememeber that this is a demo of LLMatic with LangChain.
AI:  Yes, I understand. I am ready to assist you with your queries.
Human: What did I ask you to remember?

> Finished chain.
 You asked me to remember that this is a demo of LLMatic with LangChain.

Custom Adapters

LLMatic is designed to be easily extensible. You can create your own adapters by extending the LlmAdapter class. See examples/custom-adapter for an example.

To start llmatic with a custom adapter, use the --llm-adapter flag:

llmatic start --llm-adapter ./custom-llm-adapter.ts