ID-based RAG FastAPI

Overview

This project integrates Langchain with FastAPI in an Asynchronous, Scalable manner, providing a framework for document indexing and retrieval, using PostgreSQL/pgvector.

Files are organized into embeddings by file_id. The primary use case is for integration with LibreChat, but this simple API can be used for any ID-based use case.

The main reason to use the ID approach is to work with embeddings on a file-level. This makes for targeted queries when combined with file metadata stored in a database, such as is done by LibreChat.

The API will evolve over time to employ different querying/re-ranking methods, embedding models, and vector stores.

Features

Document Management: Methods for adding, retrieving, and deleting documents.
Vector Store: Utilizes Langchain's vector store for efficient document retrieval.
Asynchronous Support: Offers async operations for enhanced performance.

Setup

Getting Started

Configure .env file based on section below
Setup pgvector database:
- Run an existing PSQL/PGVector setup, or,
- Docker: docker compose up (also starts RAG API)
  - or, use docker just for DB: docker compose -f ./db-compose.yaml up
Run API:
- Docker: docker compose up (also starts PSQL/pgvector)
  - or, use docker just for RAG API: docker compose -f ./api-compose.yaml up
- Local:
  - Make sure to setup DB_HOST to the correct database hostname
  - Run the following commands (preferably in a virtual environment)

pip install -r requirements.txt
uvicorn main:app

Environment Variables

The following environment variables are required to run the application:

RAG_OPENAI_API_KEY: The API key for OpenAI API Embeddings (if using default settings).
- Note: OPENAI_API_KEY will work but RAG_OPENAI_API_KEY will override it in order to not conflict with LibreChat setting.
RAG_OPENAI_BASEURL: (Optional) The base URL for your OpenAI API Embeddings
RAG_OPENAI_PROXY: (Optional) Proxy for OpenAI API Embeddings
VECTOR_DB_TYPE: (Optional) select vector database type, default to pgvector.
POSTGRES_DB: (Optional) The name of the PostgreSQL database, used when VECTOR_DB_TYPE=pgvector.
POSTGRES_USER: (Optional) The username for connecting to the PostgreSQL database.
POSTGRES_PASSWORD: (Optional) The password for connecting to the PostgreSQL database.
DB_HOST: (Optional) The hostname or IP address of the PostgreSQL database server.
DB_PORT: (Optional) The port number of the PostgreSQL database server.
RAG_HOST: (Optional) The hostname or IP address where the API server will run. Defaults to "0.0.0.0"
RAG_PORT: (Optional) The port number where the API server will run. Defaults to port 8000.
JWT_SECRET: (Optional) The secret key used for verifying JWT tokens for requests.
- The secret is only used for verification. This basic approach assumes a signed JWT from elsewhere.
- Omit to run API without requiring authentication
COLLECTION_NAME: (Optional) The name of the collection in the vector store. Default value is "testcollection".
CHUNK_SIZE: (Optional) The size of the chunks for text processing. Default value is "1500".
CHUNK_OVERLAP: (Optional) The overlap between chunks during text processing. Default value is "100".
RAG_UPLOAD_DIR: (Optional) The directory where uploaded files are stored. Default value is "./uploads/".
PDF_EXTRACT_IMAGES: (Optional) A boolean value indicating whether to extract images from PDF files. Default value is "False".
DEBUG_RAG_API: (Optional) Set to "True" to show more verbose logging output in the server console, and to enable postgresql database routes
CONSOLE_JSON: (Optional) Set to "True" to log as json for Cloud Logging aggregations
EMBEDDINGS_PROVIDER: (Optional) either "openai", "bedrock", "azure", "huggingface", "huggingfacetei" or "ollama", where "huggingface" uses sentence_transformers; defaults to "openai"
EMBEDDINGS_MODEL: (Optional) Set a valid embeddings model to use from the configured provider.
- Defaults
- openai: "text-embedding-3-small"
- azure: "text-embedding-3-small" (will be used as your Azure Deployment)
- huggingface: "sentence-transformers/all-MiniLM-L6-v2"
- huggingfacetei: "http://huggingfacetei:3000". Hugging Face TEI uses model defined on TEI service launch.
- ollama: "nomic-embed-text"
- bedrock: "amazon.titan-embed-text-v1"
RAG_AZURE_OPENAI_API_VERSION: (Optional) Default is 2023-05-15. The version of the Azure OpenAI API.
RAG_AZURE_OPENAI_API_KEY: (Optional) The API key for Azure OpenAI service.
- Note: AZURE_OPENAI_API_KEY will work but RAG_AZURE_OPENAI_API_KEY will override it in order to not conflict with LibreChat setting.
RAG_AZURE_OPENAI_ENDPOINT: (Optional) The endpoint URL for Azure OpenAI service, including the resource.
- Example: https://YOUR_RESOURCE_NAME.openai.azure.com.
- Note: AZURE_OPENAI_ENDPOINT will work but RAG_AZURE_OPENAI_ENDPOINT will override it in order to not conflict with LibreChat setting.
HF_TOKEN: (Optional) if needed for huggingface option.
OLLAMA_BASE_URL: (Optional) defaults to http://ollama:11434.
ATLAS_SEARCH_INDEX: (Optional) the name of the vector search index if using Atlas MongoDB, defaults to vector_index
MONGO_VECTOR_COLLECTION: Deprecated for MongoDB, please use ATLAS_SEARCH_INDEX and COLLECTION_NAME
AWS_DEFAULT_REGION: (Optional) defaults to us-east-1
AWS_ACCESS_KEY_ID: (Optional) needed for bedrock embeddings
AWS_SECRET_ACCESS_KEY: (Optional) needed for bedrock embeddings

Make sure to set these environment variables before running the application. You can set them in a .env file or as system environment variables.

Use Atlas MongoDB as Vector Database

Instead of using the default pgvector, we could use Atlas MongoDB as the vector database. To do so, set the following environment variables

VECTOR_DB_TYPE=atlas-mongo
ATLAS_MONGO_DB_URI=<mongodb+srv://...>
COLLECTION_NAME=<vector collection>
ATLAS_SEARCH_INDEX=<vector search index>

The ATLAS_MONGO_DB_URI could be the same or different from what is used by LibreChat. Even if it is the same, the $COLLECTION_NAME collection needs to be a completely new one, separate from all collections used by LibreChat. In addition, create a vector search index for collection above (remember to assign $ATLAS_SEARCH_INDEX) with the following json:

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "file_id",
      "type": "filter"
    }
  ]
}

Follow one of the four documented methods to create the vector index.

Cloud Installation Settings:

AWS:

Make sure your RDS Postgres instance adheres to this requirement:

The pgvector extension version 0.5.0 is available on database instances in Amazon RDS running PostgreSQL 15.4-R2 and higher, 14.9-R2 and higher, 13.12-R2 and higher, and 12.16-R2 and higher in all applicable AWS Regions, including the AWS GovCloud (US) Regions.

In order to setup RDS Postgres with RAG API, you can follow these steps:

Create a RDS Instance/Cluster using the provided AWS Documentation.
Login to the RDS Cluster using the Endpoint connection string from the RDS Console or from your IaC Solution output.
The login is via the Master User.
Create a dedicated database for rag_api: create database rag_api;.
Create a dedicated user\role for that database: create role rag;
Switch to the database you just created: \c rag_api
Enable the Vector extension: create extension vector;
Use the documentation provided above to set up the connection string to the RDS Postgres Instance\Cluster.

Notes:

Even though you're logging with a Master user, it doesn't have all the super user privileges, that's why we cannot use the command: create role x with superuser;
If you do not enable the extension, rag_api service will throw an error that it cannot create the extension due to the note above.

Dev notes:

Installing pre-commit formatter

Run the following commands to install pre-commit formatter, which uses black code formatter:

pip install pre-commit
pre-commit install

yswlww/rag_api