LLM Application with Pathway Vector Index

Overview

This project demonstrates how to build a Large Language Model (LLM) application using Pathway's vector index on a Windows environment. The application leverages vector indexing to efficiently retrieve similar documents based on semantic similarity and integrates with LLM toolkits to provide powerful language processing capabilities.

Features

Efficient vector indexing for fast similarity searches and nearest neighbor queries.
Integration with Pathway's vector index to avoid the need for a separate vector database.
RESTful API for easy querying and data retrieval.
Compatibility with popular LLM toolkits like Langchain and Llama-index.
Scalable and high-performance solution for large datasets.
Flexible data ingestion from various formats, including text and images.

Prerequisites

Docker installed on your Windows machine.
OpenAI API key for generating embeddings and LLM responses.
Python environment with necessary dependencies.

Setup and Installation

Install Docker:
- Download Docker Desktop for Windows from Docker's official website.

Create Dockerfile:

In your project directory, create a file named Dockerfile with the following content:

FROM pathwaycom/pathway:latest
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "./your-script.py"]

Replace your-script.py with the name of your Python script file.

Build Docker Image:
- Open a terminal or command prompt, navigate to your project directory, and run:
```
docker build -t my-pathway-app .
```

Run Docker Container:

To start the container, run:

docker run -it --rm --name my-pathway-app my-pathway-app

For environment variables and port mapping, use:

docker run -p 8080:8080 --env-file .env my-pathway-app

Running the Application

Set Environment Variables:

Configure necessary environment variables:

os.environ['OPENAI_API_KEY'] = '<your_openai_api_key>'
os.environ['PATHWAY_DATA_DIR'] = '/content/data/pathway-docs/'
os.environ['PATHWAY_REST_CONNECTOR_HOST'] = '0.0.0.0'
os.environ['PATHWAY_REST_CONNECTOR_PORT'] = '8080'

Run the Script:
- The script your-script.py contains the logic for setting up the vector index and handling queries. Ensure this script is in your project directory and includes necessary imports and function definitions.

Test the Application:

Use Postman or any HTTP client to test the RESTful API. Example curl command:

curl -X POST -H "Content-Type: application/json" -d '{"user": "user", "query": "How to build vector index in Pathway?"}' http://0.0.0.0:8080/ | jq

Code Structure

app.py: Main script to run the application.
requirements.txt: List of dependencies.
Dockerfile: Docker configuration file for building the image.

Detailed Guide

For a detailed, step-by-step guide to building this application, check out my blog post: Explaining Vector Index and Building an LLM Application with Pathway on Windows.

Resources

Contact

For any issues or questions, feel free to reach out to me via LinkedIn or email.

Happy coding! 🚀

vinayakgavariya/llm-app-using-pathway-vector-index