/DiskANN-demo

The Streamlit app uses cosine similarity to semantically match your query with Airbnb listings and find matching properties in our database using HNSW vs DiskANN.

Primary LanguageJupyter NotebookMIT LicenseMIT

DiskANN Vector Index in Azure Database for PostgreSQL

Deploy to Azure

We're thrilled to announce the preview of DiskANN, a leading vector indexing algorithm, on Azure Database for PostgreSQL - Flexible Server! Developed by Microsoft Research and used extensively at Microsoft in global services such as Bing and Microsoft 365, DiskANN enables developers to build highly accurate, performant and scalable Generative AI applications surpassing pgvector’s HNSW and IVFFlat in both latency and accuracy. DiskANN also overcomes a long-standing limitation of pgvector in filtered vector search, where it occasionally returns incorrect results.

DiskANN_demo_for_blog.mp4

Sample Website. This sample application show a sample AirBNB dataset search page:

  • It illustrate the improved recall of using DiskANN vs using HNSW.
  • When a filter is apply you will notice HNSW index doesn't return the same amount of results as DiskANN or No Index.

Table of Content

Documentation

Read more on how to use DiskANN in the Microsoft Docs: Docs

Check out the blog: Blog

Getting started

First step will be to setup the data on the new Postgres Database you just created.

Make sure the following tools are installed:

Enroll in the pg_diskann Preview Feature

Follow Microsoft documentation for enrolling in DiskANN preview

Enable pg_diskann extension

Follow Microsoft documentation for enabling DiskANN

Setup Seattle AirBnb Data and test DiskANN

This demo app will show you how DiskANN Index works better that HNSW.

0. Update your .env file

Update your .env with the following:

AZURE_OPENAI_API_KEY ="***sample***key****"
AZURE_OPENAI_ENDPOINT="https://sample.openai.azure.com/"
AZURE_PG_CONNECTION="dbname={DB_NAME} host={HOST} port=5432 sslmode=require user={USER_NAME} password={PASSWORD}"

1. Set up Data

Run commands in the file setup/sql-scripts/1_PGAI-Demo_setup.sql in psql or your favorite Postgres Editor

If running from psql:

\i setup/sql-scripts/1_PGAI-Demo_setup.sql

2. Set up OpenAI endpoint, embed data and create indexes

Run commands in the file setup/sql-scripts/2_PGAI-Demo_endpoint_and_embedding_config.sql in psql or your favorite Postgres Editor

If running from psql:

\i setup/sql-scripts/2_PGAI-Demo_endpoint_and_embedding_config.sql

3. Test out sample vector queries

Run commands in the file setup/sql-scripts/3_PGAI-Demo_pgai_queries.sql in psql or your favorite Postgres Editor

If running from psql:

\i setup/sql-scripts/3_PGAI-Demo_pgai_queries.sql

Build Sample Application Locally

Setting up the environment file

Since the local app uses OpenAI models, you should first deploy it for the optimal experience.

  1. Copy .env.sample into a .env file.
  2. To use Azure OpenAI, fill in the values of AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_API_KEY based on the deployed values.
  3. Fill in the connection string value for AZURE_PG_CONNECTION, You can find this in the Azure Portal

Install dependencies

Install required Python packages and streamlit application:

python3 -m venv .diskann
source .diskann/bin/activate
cd src
pip install -r requirements.txt

Running the application

From root directory

cd src/app
streamlit run app.py

When run locally run looking for website at http://localhost:8501/

Explore Indexes with Python Notebook

Explore Python Notebook