AI-assisted search

Code based on this repo. Another example here.

Demonstrates use of generative technology in both steps of a typical RAG pattern: data retrieval and prompt response.

For data retrieval, the LLM is instructed to generate a search query that is optimized to return only the data needed by the user prompt and use all available features of the index to do so, e.g. filters, sort options and response count (top). The returned data is then provided to a separate LLM completion request to answer the user's prompt.

The objective for using AI to generate the search API is to ensure that all data relevant to the user prompt is retrieved for presentation to the completion request. For example, if the user prompt is requesting highest or lowest values for some component of the data, the generated query should include a sort statement. Otherwise, the subsequent prompt completion will not have the data it needs to respond to the user prompt.

Optimizing search criteria for each user prompts will increase the quality of the eventual response (since the prompt completion will have the relavant data), reduce cost of the completion (since only relevant data will be passed to it) and improve completion latency due to smaller data payload size.

In order to optimize the search API, the index schema needs to be designed to support the required operations, e.g. have the filter fields that are likely to be needed to find data appropriate for user prompts. Further enhencement in Azure Search include use of :

Vector search and semantic scoring.
Scoring profiles to increase significance of hits of specific fields.
Synonyms to provide alternative ways for specifying text values

Setup

Clone to local or open in Codespace
Copy env-sample to .env
Update .env with your settings (blob storage, GPT and embedding model)
Create your index using create-index notebook
Execute queries using the ai-search notebook

Note that .env is .gitignor'ed to protect your secrets.

Monitoring

Log monitor

ApiManagementGatewayLogs
| where OperationId == 'ChatCompletions_Create'
| extend modelkey = substring(parse_json(BackendResponseBody)['model'], 0, indexof(parse_json(BackendResponseBody)['model'], '-', 0, -1, 2))
| extend model = tostring(parse_json(BackendResponseBody)['model'])
| extend prompttokens = parse_json(parse_json(BackendResponseBody)['usage'])['prompt_tokens']
| extend completiontokens = parse_json(parse_json(BackendResponseBody)['usage'])['completion_tokens']
| extend totaltokens = parse_json(parse_json(BackendResponseBody)['usage'])['total_tokens']
| extend ip = CallerIpAddress
| summarize
    sum(todecimal(prompttokens)),
    sum(todecimal(completiontokens)),
    sum(todecimal(totaltokens)),
    avg(todecimal(totaltokens))
    by ip, model

was: | where OperationId == 'completions_create'

mrochon/azuresearch

AI-assisted search

Setup

Monitoring