LLM - Azure OpenAI-based Retrieval Augmented Generation (RAG)

This custom step uses a Retrieval Augmented Generation (RAG) approach to provide right context to an Azure OpenAI Large Language Model (LLM) for purposes of answering a question.

LLMs require context to provide relevant answers, especially for questions based on a local body of knowledge or document corpus.

A RAG approach, explained in simple terms, retrieves relevant data from a knowledge base and provides the same to an LLM to use as context. Results based on RAG are expected to reduce LLM hallucinations (i.e. an LLM provides irrelevant or false answers). This custom step queries a Chromadb vector store and passes retrieved documents to an Azure OpenAI service.

IMPORTANT: Be aware that this custom step uses an Azure OpenAI service that results in data being sent over to the service. Ensure you use this only in accordance with your organization's policies on calling external LLMs.

A general idea

This animated gif provides a basic idea:

Assumptions
Requirements
Parameters
Run-time Control
Documentation
SAS Program
Installation and Usage
Created/Contact
Change Log

Assumptions

Current assumptions for this initial versions (future versions may improve upon the same):

Users choose either an existing Chroma DB vector database collection or load PDF files to an existing or new Chroma DB collection.
Users may load all PDFs in a directory on the SAS Server (filesystem), or select a PDF of their choice.
The code assumes use of a Chroma DB vector store. Users may choose to replace this with other vector stores supported by the langchain framework by modifying the underlying code.
The step uses the langchain LLM framework.
PDFs (containing text) are currently the only loadable file format in this step. Users are however free to ingest various other document types into a Chroma DB collection beforehand, using the "Vector Databases - Hydrate Chroma DB collection" SAS Studio Custom Step (refer documentation)
User has already configured Azure OpenAI to deploy both an embedding function and LLM service, or knows the deployment names.

Requirements

A SAS Viya 4 environment version 2024.01 or later.
Python: Python version 3.10 is recommended to avoid package support or dependency issues.
Python packages to be installed:
Valid Azure OpenAI service with embedding & large language models deployed. Refer here for instructions

Parameters

Input Parameters

Source file location (optional, default is Context already loaded): In case you wish to present new source files to use as context, choose either selecting a folder or file. Otherwise, provide the name of an existing vector store collection in Configuration.
Question (text area, required): Provide your question to the LLM. Note that this will be added to additional system prompt, to create a prompt that will be passed to the LLM.

Configuration

Embedding model (text field, required): provide the name of your Azure OpenAI deployment of an OpenAI embedding model. For convenience, it's suggested to use the same name as the model you wish to use. For example, if your OpenAI embedding model happens to be text-embedding-3-small, use the same name for your deployment.
Vector store persistent path (text field, defaults to /tmp if blank): provide a path to a ChromaDB database. If blank, this defaults to /tmp on the filesystem.
Chroma DB collection name (text field): provide name of the Chroma DB collection you wish to use. If the collection does not exist, a new one will be created. Ensure you have write access to the persistent area.
Text generation model (text field, required): provide the name of an Azure OpenAI text generation deployment. For convenience, you may choose to use the same name as the OpenAI LLM. Example, gpt-35-turbo to gpt-35-turbo.
Azure OpenAI service details (file selector for key and text fields, required): provide a path to your Azure OpenAI access key. Ensure this key is saved within a text file in a secure location on the filesystem. Users are responsible for providing their keys to use this service. In addition, also refer to your Azure OpenAI service to obtain the service endpoint and region.

Output Specifications

Results (the answer from the LLM) are printed by default to the output window.

Temperature (numeric stepper, default 0, max 1): temperature for an LLM affects its abiity to predict the next word while generating responses. A rule of thumb is that a temperature closer to 0 indicates the model uses the predicted next word with the highest probability and provides stable responses, whereas a temperature of 1 increases the randomness with which the model predicts the next word which may lead to more creative responses.
Context size (numeric stepper, default 10): select how many similar results from the vector store should be retrieved and provided as context to the LLM. Note that a higher number results in more tokens provided as part of the prompt.
Output table (output port, option): attach either a CAS table or sas7bdat to the output port of this node to hold results. These results contain the LLM's answer, the original question and supporting retrieved results.

Run-time Control

Note: Run-time control is optional. You may choose whether to execute the main code of this step or not, based on upstream conditions set by earlier SAS programs. This includes nodes run prior to this custom step earlier in a SAS Studio Flow, or a previous program in the same session.

Refer this blog (https://communities.sas.com/t5/SAS-Communities-Library/Switch-on-switch-off-run-time-control-of-SAS-Studio-Custom-Steps/ta-p/885526) for more details on the concept.

The following macro variable,

_aor_run_trigger

will initialize with a value of 1 by default, indicating an "enabled" status and allowing the custom step to run.

If you wish to control execution of this custom step, include code in an upstream SAS program to set this variable to 0. This "disables" execution of the custom step.

To "disable" this step, run the following code upstream:

%global _aor_run_trigger;
%let _aor_run_trigger =0;

To "enable" this step again, run the following (it's assumed that this has already been set as a global variable):

%let _aor_run_trigger =1;

IMPORTANT: Be aware that disabling this step means that none of its main execution code will run, and any downstream code which was dependent on this code may fail. Change this setting only if it aligns with the objective of your SAS Studio program.

Documentation

SAS Program

Refer here for the SAS program used by the step. You'd find this useful for situations where you wish to execute this step through non-SAS Studio Custom Step interfaces such as the SAS Extension for Visual Studio Code, with minor modifications.

Installation & Usage

Refer to the steps listed here.

Created/contact:

Samiul Haque (samiul.haque@sas.com)
Sundaresh Sankaran (sundaresh.sankaran@sas.com)

Change Log

Version 1.0 (17MAR2024)
- Initial version

SundareshSankaran/LLM-RAG-SAS