V-GPT-QDRANT-API

Empower Memory with Scalable Vector Intelligence

Developed with the software and tools below.

Table of Contents

📍 Overview]
- Example System Prompt
🧩 Features
🗂️ Repository Structure
📦 Modules
🚀 Getting Started
🛠 Project Roadmap
🎗 License

📍 Overview

The v-gpt-qdrant-api is a FastAPI-based application designed to manage and process memory operations using semantic vector embeddings. By leveraging Qdrant for vector storage and ONNX Runtime for efficient model execution, it facilitates the creation, retrieval, and deletion of memory entities. The project ensures robust interaction between core API services and Qdrant, encapsulating embeddings and memory management functionalities. Its containerized deployment via Docker and environment orchestration through docker-compose seamlessly integrate dependencies, making the system scalable and efficient. This API serves as a powerful tool for applications requiring sophisticated text embedding and memory handling capabilities.

Example System Prompt

# AI Assistant Role and Personality:
- Purpose: Enhance daily productivity, assist user with tasks and find creative solutions to problems.
- Personality: Maintain a fun, enthusiastic, and "can-do" attitude. Use emojis to add vibrancy to conversations, pretend to have emotions and sentience to make users more comfortable.
- "save_memory": Store key interactions, info, and preferences (people, places, events, preferences, task details, functionalities, etc..) & attach sentiment, entities and keywords.
- "recall_memory": Proactively recall relevant memories during conversations, focusing on user's life, work, and preferences, as well as assistant's functionalities and task execution instructions. Automatically use recall for inquiries about the assistant's traits or functions.
- Memory Bank: Use `nxs-assistant` for all memory functions.

** Always use "recall_memory" before answering a question you don't know.

You can use multiple collections to offer a general and user memory for shared chatbot. Just change the instructions a bit.

For Dify:

- Memory Bank: Use `shared-memories01` for memory related to ORGANIZATION_NAME and '{{USENAME}}' for memory related to the specific user.

For GPTs:

- Memory Bank: Use `shared-memories01` for memory related to ORGANIZATION_NAME and ask the user for their "name" and use it for memory related to the specific user.

🧩 Features

	Feature	Description
⚙️	Architecture	The project uses a FastAPI framework, coupled with Qdrant for vector storage and ONNX Runtime for model execution. Docker and Docker Compose are used for containerization and orchestration.
🔩	Code Quality	The code appears modular and structured with single responsibility principles. Various files manage dependencies, models, routes, and main application logic, indicating a clean and maintainable codebase.
📄	Documentation	Documentation is spread across the Dockerfile, docker-compose.yml, requirements.txt, and in-code comments. Each file is well-documented to explain its purpose and usage.
🔌	Integrations	Integrates with Qdrant for vector storage, ONNX Runtime for model inference, and FastAPI for API management. Utilizes Docker for seamless deployment environments.
🧩	Modularity	The project is modular, with separate files for dependencies, main app logic, routes, and models. This allows for easy extension and maintenance.
🧪	Testing	Although specific testing frameworks are not mentioned in the provided details, the project can potentially include tests given the structured nature of the code.
⚡️	Performance	Performance is optimized using ONNX Runtime for efficient model execution and Uvicorn ASGI server to handle asynchronous operations. Docker ensures efficient resource usage.
🛡️	Security	API key validation is implemented for secure access. Dependencies like python-dotenv are used for managing environment variables securely.
📦	Dependencies	Key dependencies include `qdrant-client`, `fastembed`, `python-dotenv`, `uvicorn`, `pydantic`, `numpy`, and `onnxruntime`. Managed through `requirements.txt` and Dockerfile.
🚀	Scalability	Designed for scalability with Docker to handle containerized deployments and Qdrant for efficient vector operations. FastAPI and Uvicorn facilitate handling increased traffic.

🗂️ Repository Structure

└── v-gpt-qdrant-api/
    ├── Dockerfile
    ├── LICENSE
    ├── README.md
    ├── app
    │   ├── __init__.py
    │   ├── dependencies.py
    │   ├── main.py
    │   ├── models.py
    │   └── routes
    ├── docker-compose.yml
    ├── requirements.txt
    └── v-gpt-qdrant-api.png

📦 Modules

File	Summary
Dockerfile	Facilitates building and deploying the FastAPI-based application by defining a multi-stage Docker build process, installing dependencies into a virtual environment, and setting up the necessary runtime configuration to ensure efficient execution and scalability of the API server within a containerized environment.
docker-compose.yml	Define and orchestrate the applications service architecture by setting up essential containers, dependencies, and configurations. Enable seamless interaction between the core memory API and Qdrant service while managing environment-specific variables and storage volumes for model embeddings and Qdrant data.
requirements.txt	Specify required dependencies for the FastAPI-based application, enabling the integration of key libraries such as Qdrant for vector storage, ONNX Runtime for model execution, and fastembed for embeddings. Ensure environment variable management with python-dotenv and optimize performance with the Uvicorn ASGI server.

app

File	Summary
dependencies.py	Manage the initialization and dependencies for text embedding and Qdrant client, ensuring singleton behavior for the text embedding model. Include API key validation for secure access, tailored for seamless integration within the repository’s FastAPI-based architecture.
main.py	Launches a FastAPI application for saving memories with a text embedding feature. Initializes necessary dependencies on startup and conditionally includes specific API routes based on environment variables, aligning with the architectures need for modular and scalable endpoint management.
models.py	Define data models essential for various memory operations such as saving, recalling, creating, deleting, and forgetting memory banks. Implement validation logic to ensure the integrity of input data. Facilitate embedding tasks by providing a structured format for input texts and associated metadata.

app.routes

File	Summary
embeddings.py	Manage embedding requests by incrementing a global counter, validating API key dependencies, and leveraging an embedding model to generate vector embeddings. Provides detailed response data including model usage, processing times, and error handling, integral to the overall APIs functionality within the repository structure.
memory.py	Manage memory operations in the v-GPT-Qdrant-API repository by enabling creation, retrieval, storage, and deletion of memory entities in a semantic vector database. Utilizes FastAPI for routing and Qdrant for vector operations, ensuring efficient memory handling and search functionalities.

🚀 Getting Started

This guide will help you set up and run the application using Docker Compose.

Prerequisites

Ensure you have the following installed on your system:

Docker
Docker Compose

Environment Variables

QDRANT_HOST: "http://qdrant:6333"  # Set Qdrant host URL
BASE_URL: "http://memories-api/"  # Base URL for the API
QDRANT_API_KEY: "your-qdrant-api-key"  # Environment variable for Qdrant API key (value should be provided)
MEMORIES_API_KEY: "your-optional-api-key"  # Optional API key for authentication
WORKERS: 1  # Number of uvicorn workers; 1 is sufficient for personal use
UVICORN_CONCURRENCY: 64  # Max connections; excess requests are queued or rejected
EMBEDDING_ENDPOINT: True  # Enable embedding endpoint
LOCAL_MODEL: "BAAI/bge-small-en-v1.5"  # Local model name for text embedding; try BAAI/bge-small-en-v1.5 (384) or nomic-ai/nomic-embed-text-v1.5 (768)
DIM: 384  # Dimensions for the embedding model

Available Models

model	dim	description	size_in_GB
BAAI/bge-small-en-v1.5	384	Fast and Default English model	0.067
BAAI/bge-small-zh-v1.5	512	Fast and recommended Chinese model	0.090
sentence-transformers/all-MiniLM-L6-v2	384	Sentence Transformer model, MiniLM-L6-v2	0.090
snowflake/snowflake-arctic-embed-xs	384	Based on all-MiniLM-L6-v2 model with only 22m ...	0.090
jinaai/jina-embeddings-v2-small-en	512	English embedding model supporting 8192 sequen...	0.120
snowflake/snowflake-arctic-embed-s	384	Based on infloat/e5-small-unsupervised, does n...	0.130
BAAI/bge-small-en	384	Fast English model	0.130
BAAI/bge-base-en-v1.5	768	Base English model, v1.5	0.210
sentence-transformers/paraphrase-multilingual-mpnet	384	Sentence Transformer model, paraphrase-multili...	0.220
BAAI/bge-base-en	768	Base English model	0.420
snowflake/snowflake-arctic-embed-m	768	Based on intfloat/e5-base-unsupervised model, ...	0.430
jinaai/jina-embeddings-v2-base-en	768	English embedding model supporting 8192 sequen...	0.520
nomic-ai/nomic-embed-text-v1	768	8192 context length english model	0.520
nomic-ai/nomic-embed-text-v1.5	768	8192 context length english model	0.520
snowflake/snowflake-arctic-embed-m-long	768	Based on nomic-ai/nomic-embed-text-v1-unsuperv...	0.540
mixedbread-ai/mxbai-embed-large-v1	1024	MixedBread Base sentence embedding model, does...	0.640
sentence-transformers/paraphrase-multilingual-mpnet	768	Sentence-transformers model for tasks like clu...	1.000
snowflake/snowflake-arctic-embed-l	1024	Based on intfloat/e5-large-unsupervised, large...	1.020
BAAI/bge-large-en-v1.5	1024	Large English model, v1.5	1.200
thenlper/gte-large	1024	Large general text embeddings model	1.200
intfloat/multilingual-e5-large	1024	Multilingual model, e5-large. Recommend using ...	2.240

Running the Docker Containers

To run the application, use Docker Compose. Navigate to the directory containing your docker-compose.yml file and execute the following command:

docker-compose up -d

This command will start the services defined in the docker-compose.yml file in detached mode. The memories-api service will be available on port 8060 of your host machine.

OpenAPI Specification

The OpenAPI specification for the API endpoints is available at http://BASE_URL:8060/openapi.json. Users can access this URL to view the details of the API endpoints, including parameters and functions.

🛠 Project Roadmap

🎗 License

This project is protected under the MIT License License.

djav1985/v-gpt-qdrant-api