Enterprise Azure OpenAI

Repository detailing the deployment of an Enterprise Azure OpenAI reference architecture.
Link: Azure Architecture Center - Monitor OpenAI Models

An advanced pattern is available for customers using models with larger token sizes, want to perform advanced analytics on the prompts and responses, or have requirements to send these events to another type of data store.

Key Solution Advantages:

Comprehensive logging of Azure OpenAI model execution tracked to Source IP address. Log information includes what text users are submitting to the model as well as text being received back from the model. This ensures models are being used responsibly within the corporate environment and within the approved use cases of the service.
Advanced Usage and Throttling controls allow fine-grained access controls for different user groups without allowing access to underlying service keys.
High availability of the model APIs to ensure user requests are met even if the traffic exceeds the limits of a single Azure OpenAI Service.
Secure use of the service by ensuring role-based access managed via Azure Active Directory follows principle of least privilege.

EnterpriseLogging_0.mp4

Reference Architecture

1. Client applications can access Azure OpenAI endpoints to perform text generation (completions) and model training (fine-tuning) endpoints to leverage the power of large language models.

2. Next-Gen Firewall Appliance (Optional) - Provides deep packet level inspection for network traffic to the OpenAI Models.

3. API Management Gateway enables security controls, auditing, and monitoring of the Azure OpenAI models. Security access is granted via AAD Groups with subscription based access permissions in APIM. Auditing is enabled via Azure Monitor request logging for all interactions with the models. Monitoring enables detailed AOAI model usage KPIs/Metrics.

4. API Management Gateway connects to all Azure resources via Private Link to ensure all traffic is secured by private endpoints and contained to private network.

5. Multiple Azure OpenAI instances enable scale out of API usage to ensure high-availability and disaster recovery for the service.

Features

This project framework provides the following features:

Enterprise logging of OpenAI usage metrics:
- Token Usage
- Model Usage
- Prompt Input
- User statistics
- Prompt Response
High Availability of OpenAI service with region failover.
Integration with latest OpenAI libraries-

Getting Started

Prerequisites

Installation

Provisioning artifacts, begin by provisioning the solution artifacts listed below:

(Optional)

Next-Gen Firewall Appliance
Azure Application Gateway
Azure Virtual Network

Managed Services

Configuration

Azure OpenAI

To begin, provision a resource for Azure OpenAI in your preferred region: Provision resource
Once the resource is provisioned, create a deployment with model of choice: Deploy Model
After the model has been deployed, go to the OpenAI studio to test your newly created model with the studio playground: oai.azure.com/portal
Note down Key1 from the Azure OpenAI instance by opening the Azure OpenAI instance, then from the Resource Management section of the left menu, select Keys and Endpoints.

Azure Key Vault

Provision an Azure Key Vault Resource: Deploy Key Vault

Once deployed, add Key1 from the Azure OpenAI instance as a secret: Add a Secret

API Management Config

API Management can be provisioned through Azure Portal :Provision resource
Once the API Management service has been provisioned, follow this documentation to configure access permissions for the APIM instance on the Azure Key Vaults secrets.
- Named Value Setup
  - Follow this documentation to create a named value linked to Key1 in the Azure Key Vault created earlier: Add a plain or secret value to API Management
- Backend Setup
- From the left menu in API Management select Backends, then create a new backend.
  - Configure the backend service to the endpoint of your deployed OpenAI service with /openai as the path:
  - Example: https://< yourservicename >.openai.azure.com/openai
    - Retrieve endpoint
- Under Authorization for the backend, set a new header named "api-key" and set its value to the created named value, then save the config.
- API Import instructions
- Open the APIM - API blade and Select the Import option for an existing API.
- Select the Update option to update the API to the current OpenAI specifications.
  - Completions OpenAPI - https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2023-05-15/inference.json
- (Optional) For Semantic Kernel compatibility "Update" the following Authoring API endpoints:
  - Authoring OpenAPI - https://raw.githubusercontent.com/Azure/azure-rest-api-specs/c183bb012de8e9e1d0d2e67a0994748df4747d2c/specification/cognitiveservices/data-plane/AzureOpenAI/authoring/stable/2022-12-01/azureopenai.json
For All API Operations:
- Build your API inbound policy as below.
- Configure the Diagnostic Logs settings:
  - Set the sampling rate to 100%
  - Set the "Number of payload bytes to log" as the maximum.
Test API
- Test the endpoint by providing the "deployment-id", "api-version" and a sample prompt:

(Optional) Subscription Access Control

API Management allows API providers to protect their APIs from abuse and create value for different API product tiers. Use of API Management layer to throttle incoming requests is a key role of Azure API Management. Either by controlling the rate of requests or the total requests/data transferred.

Details for configuring APIM Layer : https://learn.microsoft.com/en-us/azure/api-management/api-management-sample-flexible-throttling
Details for enabling Subscription based access to API's: API Management Subscriptions
- Note: To enable API usage via existing libraries, such as Semantic Kernel etc... you can also adjust the "Subscription" settings for the API to the following,
  
  In the calling client (using the library), you then set the "OpenAI / Azure OpenAI" URL & Key to the values for your API base URL / APIM subscription key.

Logging OpenAI completions

Once the API Management layer has been configured, you can configure existing OpenAI python code to use the API layer by adding the subscription key parameter to the completion request: Example:

import openai

openai.api_type = "azure"
openai.api_base = "https://xxxxxxxxx.azure-api.net/" # APIM Endpoint
openai.api_version = "2023-05-15"
openai.api_key = "APIM SUBSCRIPTION KEY" #DO NOT USE ACTUAL AZURE OPENAI SERVICE KEY


response = openai.Completion.create(engine="modelname",  
                                    prompt="prompt text", temperature=1,  
                                    max_tokens=200,  top_p=0.5,  
                                    frequency_penalty=0,  
                                    presence_penalty=0,  
                                    stop=None)

Demo

Once OpenAI requests begin to log to the Azure Monitor service, you can begin to analyze the service usage using Log Analytics queries.
- Log Analytics Tutorial
The table should be named "ApiManagementGatewayLogs"
The BackendResponseBody field contains the json response from the OpenAI service which includes the text completion as well as the token and model information.
Example query to identify token usage by ip and model:

ApiManagementGatewayLogs
| where tolower(OperationId) in ('completions_create','chatcompletions_create')
| where ResponseCode  == '200'
| extend modelkey = substring(parse_json(BackendResponseBody)['model'], 0, indexof(parse_json(BackendResponseBody)['model'], '-', 0, -1, 2))
| extend model = tostring(parse_json(BackendResponseBody)['model'])
| extend prompttokens = parse_json(parse_json(BackendResponseBody)['usage'])['prompt_tokens']
| extend completiontokens = parse_json(parse_json(BackendResponseBody)['usage'])['completion_tokens']
| extend totaltokens = parse_json(parse_json(BackendResponseBody)['usage'])['total_tokens']
| extend ip = CallerIpAddress
| where model !=  ''
| summarize
    sum(todecimal(prompttokens)),
    sum(todecimal(completiontokens)),
    sum(todecimal(totaltokens)),
    avg(todecimal(totaltokens))
    by ip, model

Example query to monitor prompt completions:

ApiManagementGatewayLogs
| where tolower(OperationId) in ('completions_create','chatcompletions_create')
| where ResponseCode  == '200'
| extend model = tostring(parse_json(BackendResponseBody)['model'])
| extend prompttokens = parse_json(parse_json(BackendResponseBody)['usage'])['prompt_tokens']
| extend prompttext = substring(parse_json(parse_json(BackendResponseBody)['choices'])[0], 0, 100)

Resources

Azure API Management Policies for Azure OpenAI: https://github.com/mattfeltonma/azure-openai-apim
Advanced Retry Policies: https://github.com/ian-t-adams/azure-openai-api-m-retry/

Frequently Asked Questions

Where is the "Deploy to Azure" button?
- In our experience, most enterprise cloud administrators first need to understand the solution before deploying it into an enterprise environment. The steps in this repo show how each component is deployed and configured so that they can be integrated into your existing deployment scripts. We do have bicep templates available to accelerate your development once you are familiar with the architecture.
Does the solution work with Private Endpoints?
- Yes, to configure the solution to work with private endpoints you will need to:
  - Configure your OpenAI instance to use a private endpoint.
  - Ensure API Management can resolve the private endpoint, if they are in different virtual networks this may require vnet link
  - Configure API Management to use internal networking
  - Ensure that API Management endpoints are accessible by your client link
How do I secure my Azure OpenAI endpoints once this solution is deployed?
- Option 1: Rotate all OpenAI Service keys once API Management is configured.
- Option 2: Disable key based access to Azure OpenAI Instance
  - Will impact Azure OpenAI Studio tool

dereknguyenio/openai-python-enterprise-logging