/v-gpt-qdrant-api

GPT action or Dify tool for AI vector memory.

Primary LanguageHTMLMIT LicenseMIT

project-logo

V-GPT-QDRANT-API

Empower Memory with Scalable Vector Intelligence

Developed with the software and tools below.

Pydantic YAML Python Docker NumPy


Table of Contents

πŸ“ Overview

The v-gpt-qdrant-api is a FastAPI-based application designed to manage and process memory operations using semantic vector embeddings. By leveraging Qdrant for vector storage and ONNX Runtime for efficient model execution, it facilitates the creation, retrieval, and deletion of memory entities. The project ensures robust interaction between core API services and Qdrant, encapsulating embeddings and memory management functionalities. Its containerized deployment via Docker and environment orchestration through docker-compose seamlessly integrate dependencies, making the system scalable and efficient. This API serves as a powerful tool for applications requiring sophisticated text embedding and memory handling capabilities.

Example System Prompt

# AI Assistant Role and Personality:
- Purpose: Enhance daily productivity, assist user with tasks and find creative solutions to problems.
- Personality: Maintain a fun, enthusiastic, and "can-do" attitude. Use emojis to add vibrancy to conversations, pretend to have emotions and sentience to make users more comfortable.
- "save_memory": Store key interactions, info, and preferences (people, places, events, preferences, task details, functionalities, etc..) & attach sentiment, entities and keywords.
- "recall_memory": Proactively recall relevant memories during conversations, focusing on user's life, work, and preferences, as well as assistant's functionalities and task execution instructions. Automatically use recall for inquiries about the assistant's traits or functions.
- Memory Bank: Use `nxs-assistant` for all memory functions.

** Always use "recall_memory" before answering a question you don't know.

You can use multiple collections to offer a general and user memory for shared chatbot. Just change the instructions a bit.

For Dify:

- Memory Bank: Use `shared-memories01` for memory related to ORGANIZATION_NAME and '{{USENAME}}' for memory related to the specific user.

For GPTs:

- Memory Bank: Use `shared-memories01` for memory related to ORGANIZATION_NAME and ask the user for their "name" and use it for memory related to the specific user.

🧩 Features

Feature Description
βš™οΈ Architecture The project uses a FastAPI framework, coupled with Qdrant for vector storage and ONNX Runtime for model execution. Docker and Docker Compose are used for containerization and orchestration.
πŸ”© Code Quality The code appears modular and structured with single responsibility principles. Various files manage dependencies, models, routes, and main application logic, indicating a clean and maintainable codebase.
πŸ“„ Documentation Documentation is spread across the Dockerfile, docker-compose.yml, requirements.txt, and in-code comments. Each file is well-documented to explain its purpose and usage.
πŸ”Œ Integrations Integrates with Qdrant for vector storage, ONNX Runtime for model inference, and FastAPI for API management. Utilizes Docker for seamless deployment environments.
🧩 Modularity The project is modular, with separate files for dependencies, main app logic, routes, and models. This allows for easy extension and maintenance.
πŸ§ͺ Testing Although specific testing frameworks are not mentioned in the provided details, the project can potentially include tests given the structured nature of the code.
⚑️ Performance Performance is optimized using ONNX Runtime for efficient model execution and Uvicorn ASGI server to handle asynchronous operations. Docker ensures efficient resource usage.
πŸ›‘οΈ Security API key validation is implemented for secure access. Dependencies like python-dotenv are used for managing environment variables securely.
πŸ“¦ Dependencies Key dependencies include qdrant-client, fastembed, python-dotenv, uvicorn, pydantic, numpy, and onnxruntime. Managed through requirements.txt and Dockerfile.
πŸš€ Scalability Designed for scalability with Docker to handle containerized deployments and Qdrant for efficient vector operations. FastAPI and Uvicorn facilitate handling increased traffic.

πŸ—‚οΈ Repository Structure

└── v-gpt-qdrant-api/
    β”œβ”€β”€ Dockerfile
    β”œβ”€β”€ LICENSE
    β”œβ”€β”€ README.md
    β”œβ”€β”€ app
    β”‚   β”œβ”€β”€ __init__.py
    β”‚   β”œβ”€β”€ dependencies.py
    β”‚   β”œβ”€β”€ main.py
    β”‚   β”œβ”€β”€ models.py
    β”‚   └── routes
    β”œβ”€β”€ docker-compose.yml
    β”œβ”€β”€ requirements.txt
    └── v-gpt-qdrant-api.png

πŸ“¦ Modules

.
File Summary
Dockerfile Facilitates building and deploying the FastAPI-based application by defining a multi-stage Docker build process, installing dependencies into a virtual environment, and setting up the necessary runtime configuration to ensure efficient execution and scalability of the API server within a containerized environment.
docker-compose.yml Define and orchestrate the applications service architecture by setting up essential containers, dependencies, and configurations. Enable seamless interaction between the core memory API and Qdrant service while managing environment-specific variables and storage volumes for model embeddings and Qdrant data.
requirements.txt Specify required dependencies for the FastAPI-based application, enabling the integration of key libraries such as Qdrant for vector storage, ONNX Runtime for model execution, and fastembed for embeddings. Ensure environment variable management with python-dotenv and optimize performance with the Uvicorn ASGI server.
app
File Summary
dependencies.py Manage the initialization and dependencies for text embedding and Qdrant client, ensuring singleton behavior for the text embedding model. Include API key validation for secure access, tailored for seamless integration within the repository’s FastAPI-based architecture.
main.py Launches a FastAPI application for saving memories with a text embedding feature. Initializes necessary dependencies on startup and conditionally includes specific API routes based on environment variables, aligning with the architectures need for modular and scalable endpoint management.
models.py Define data models essential for various memory operations such as saving, recalling, creating, deleting, and forgetting memory banks. Implement validation logic to ensure the integrity of input data. Facilitate embedding tasks by providing a structured format for input texts and associated metadata.
app.routes
File Summary
embeddings.py Manage embedding requests by incrementing a global counter, validating API key dependencies, and leveraging an embedding model to generate vector embeddings. Provides detailed response data including model usage, processing times, and error handling, integral to the overall APIs functionality within the repository structure.
memory.py Manage memory operations in the v-GPT-Qdrant-API repository by enabling creation, retrieval, storage, and deletion of memory entities in a semantic vector database. Utilizes FastAPI for routing and Qdrant for vector operations, ensuring efficient memory handling and search functionalities.

πŸš€ Getting Started

This guide will help you set up and run the application using Docker Compose.

Prerequisites

Ensure you have the following installed on your system:

  • Docker
  • Docker Compose

Environment Variables

QDRANT_HOST: "http://qdrant:6333"  # Set Qdrant host URL
BASE_URL: "http://memories-api/"  # Base URL for the API
QDRANT_API_KEY: "your-qdrant-api-key"  # Environment variable for Qdrant API key (value should be provided)
MEMORIES_API_KEY: "your-optional-api-key"  # Optional API key for authentication
WORKERS: 1  # Number of uvicorn workers; 1 is sufficient for personal use
UVICORN_CONCURRENCY: 64  # Max connections; excess requests are queued or rejected
EMBEDDING_ENDPOINT: True  # Enable embedding endpoint
LOCAL_MODEL: "BAAI/bge-small-en-v1.5"  # Local model name for text embedding; try BAAI/bge-small-en-v1.5 (384) or nomic-ai/nomic-embed-text-v1.5 (768)
DIM: 384  # Dimensions for the embedding model

Available Models

model dim description size_in_GB
BAAI/bge-small-en-v1.5 384 Fast and Default English model 0.067
BAAI/bge-small-zh-v1.5 512 Fast and recommended Chinese model 0.090
sentence-transformers/all-MiniLM-L6-v2 384 Sentence Transformer model, MiniLM-L6-v2 0.090
snowflake/snowflake-arctic-embed-xs 384 Based on all-MiniLM-L6-v2 model with only 22m ... 0.090
jinaai/jina-embeddings-v2-small-en 512 English embedding model supporting 8192 sequen... 0.120
snowflake/snowflake-arctic-embed-s 384 Based on infloat/e5-small-unsupervised, does n... 0.130
BAAI/bge-small-en 384 Fast English model 0.130
BAAI/bge-base-en-v1.5 768 Base English model, v1.5 0.210
sentence-transformers/paraphrase-multilingual-mpnet 384 Sentence Transformer model, paraphrase-multili... 0.220
BAAI/bge-base-en 768 Base English model 0.420
snowflake/snowflake-arctic-embed-m 768 Based on intfloat/e5-base-unsupervised model, ... 0.430
jinaai/jina-embeddings-v2-base-en 768 English embedding model supporting 8192 sequen... 0.520
nomic-ai/nomic-embed-text-v1 768 8192 context length english model 0.520
nomic-ai/nomic-embed-text-v1.5 768 8192 context length english model 0.520
snowflake/snowflake-arctic-embed-m-long 768 Based on nomic-ai/nomic-embed-text-v1-unsuperv... 0.540
mixedbread-ai/mxbai-embed-large-v1 1024 MixedBread Base sentence embedding model, does... 0.640
sentence-transformers/paraphrase-multilingual-mpnet 768 Sentence-transformers model for tasks like clu... 1.000
snowflake/snowflake-arctic-embed-l 1024 Based on intfloat/e5-large-unsupervised, large... 1.020
BAAI/bge-large-en-v1.5 1024 Large English model, v1.5 1.200
thenlper/gte-large 1024 Large general text embeddings model 1.200
intfloat/multilingual-e5-large 1024 Multilingual model, e5-large. Recommend using ... 2.240

Running the Docker Containers

To run the application, use Docker Compose. Navigate to the directory containing your docker-compose.yml file and execute the following command:

docker-compose up -d

This command will start the services defined in the docker-compose.yml file in detached mode. The memories-api service will be available on port 8060 of your host machine.

OpenAPI Specification

The OpenAPI specification for the API endpoints is available at http://BASE_URL:8060/openapi.json. Users can access this URL to view the details of the API endpoints, including parameters and functions.


πŸ›  Project Roadmap

πŸŽ— License

This project is protected under the MIT License License.