/restai

REST LLM Gateway. Retrieval Augmented Generation (RAG) REST API. Create embeddings associated with a project tenant and interact using a LLM. Built using Langchain and Llamaindex

Primary LanguagePythonApache License 2.0Apache-2.0

restai

  • RESTAI is a generic REST API that allows to create embeddings from multiple datatypes and then interact with them using a LLM. An API for the RAG process.
  • This LLM may be an OpenAI based model, llamacpp, transformer or any other LLM supported by langchain or compatible with OpenAI API.
  • If you want to be completely offline, you may use (for example) the llama13b_chat_gptq LLM and all-mpnet-base-v2 embeddings.
  • RESTAI features an abstraction layer supporting multiple vectorstores, right now Chroma and Redis is supported.
  • It was built thinking on low vram enviromnents, it loads and unloads LLMs automatically allowing to use multiple LLMs even if they don't all fit in VRAM simultaneously.

Details

Embeddings

  • Create embeddings from your data. You are able to ingest data by uploading files ou directly parsing an URL content.
  • You may pick whatever embeddings model thats supported by langchain, cloud based (ex: Openai) or private (HuggingFace model).
  • You can easily manage embeddings per project, view, delete and ingest new data.

Loaders

LLMs

  • You may use any LLM supported by langchain and/or transformers pipes.
  • You may declare PROMPTs templates and then use these prompts in the LLMs.

Default support

  • Embeddings: huggingface (HuggingFace), openai (OpenAI), ...
  • LLM: llamacpp (ggml-model-q4_0.bin), openai (OpenAI, text-generation-webui), ...
  • It's very easy to add support for more embeddings, loaders and LLMs.

Installation

  • RestAI uses Poetry to manage dependencies. Install it with pip install poetry.

Development

  • make install
  • make dev (starts restai in development mode)
  • make devfrontend (starts restai's frontend in development mode)

Production

  • make install
  • make prod

Example usage

POST /projects ({"name": "test_openai", "embeddings": "openai", "llm": "openai"})

{"project": "test_openai"}

POST /projects/test_openai/ingest/upload (upload a test.txt)

{"project": "test_openai", "embeddings": "openai", "documents": 2}

POST /projects/test_openai/question ({"question": "What is the secret?"})

{"question": "What is the secret?", "answer": "The secret is that ingenuity should be bigger than politics and corporate greed."}

POST /projects/test_openai/question ({"system": "You are a digital assistant, answer only in french.", "question": "What is the secret?"})

{"question": "What is the secret?", "answer": "Le secret est que l'ingéniosité doit être plus grande que la politique et la cupidité des entreprises."}

Endpoints (Swagger):

Users

  • A user represents a user of the system.
  • It's used for authentication and authorization. (basic auth)
  • Each user may have access to multiple projects.

GET /users

  • Lists all users. Users and projects is a many-to-many relationship.

POST /users

  • Create a user.

GET /users/{username}

  • Get a specific user details.

DELETE /users/{username}

  • Delete a user

PATCH /users/{username}

  • Edit a user

Projects


  • A project is an abstract entity basically a tenant. You may have multiple projects and each project has its own embeddings, loaders and llms.
  • Each project may have multiple users with access to it.
  • Projects have "sandboxed" mode, which means that a locked default answer will be given when there aren't embeddings for the provided question. This is useful for chatbots, where you want to provide a default answer when the LLM doesn't know how to answer the question, reduncing hallucination.

GET /projects

  • Lists all the projects. Users and projects is a many-to-many relationship.

GET /projects/{projectName}

  • Get the specific project details.

DELETE /projects/{projectName}

  • Deletes the specific project.

POST /projects

  • Creates a new project.

PATCH /projects/{projectName}

  • Edit a project

Embeddings - main endpoints


POST /projects/{projectName}/embeddings/ingest/url

  • Ingests data into a specific project from a provided URL.

POST /projects/{projectName}/embeddings/ingest/upload

GET /projects/{projectName}/embeddings/urls

  • Lists all the ingested URLs from a specific project.

GET /projects/{projectName}/embeddings/files

  • Lists all the ingested files from a specific project.

DELETE /projects/{projectName}/embeddings/{id}

  • Deletes a specific embedding from a specific project.

DELETE /projects/{projectName}/embeddings/url/{url}

  • Deletes a specific embedding from a specific project. Providing a previously ingested URL.

DELETE /projects/{projectName}/embeddings/files/{fileName}

  • Deletes a specific embedding from a specific project. Providing a previously ingested filename.

LLMs


POST /projects/{projectName}/question

  • Asks a question to a specific project.

POST /projects/{projectName}/chat

  • Send a chat message to a specific project. Chat differs from question, because it holds conversation history. It's chat has an unique ID (id field).

Frontend

Tests

  • Tests are implemented using pytest. Run them with make test.
  • Running on a Macmini M1 8gb takes around 5~10mins to run the HuggingFace tests. Which uses an local LLM and a local embeddings model from HuggingFace.

License

Pedro Dias - @pedromdias

Licensed under the Apache license, version 2.0 (the "license"); You may not use this file except in compliance with the license. You may obtain a copy of the license at:

http://www.apache.org/licenses/LICENSE-2.0.html

Unless required by applicable law or agreed to in writing, software distributed under the license is distributed on an "as is" basis, without warranties or conditions of any kind, either express or implied. See the license for the specific language governing permissions and limitations under the license.