Boost up 4x Request per minute for your AOAI Resources

Fully utilize AOAI quotas and limits

As at 12 June 2023, one subscription can provision 30 AOAI resources per region, sharing the same TPM and RPM limits. For example, you can allocate 3 deployment instances of GPT-35-Turbo with 80K TPM/480 RPM each to utilize the whole TPM/RPM limits for one region.

Currently, there are four regional Cognitive services that support Azure OpenAI -EastUS, South Cental US, West Europe and France Central which allows for a maximium 120 instances of the same AOAI model can be provisoned across these four regions. This means you can achieve up to (1440x4/60) = maximium 96 request per second for your ChatGPT model. If this is still not enough to meet your production workload requirements, you can consider getting additional subscriptions to create an AOAI resources RAID.

Why not just raise your quotas and limits?

You can apply for quota increase requests by filling out forms, but this may not be the most practical solution as it won't give you access to the additional resources right away. What if your AOAI service demand continues to grow? Utilizing your existing quotas is the more practical solution. Remember, you have a maximum of 120 model instances across 4 regions with 4 times quotas and limits for 1 subscription. If you don't have concerns about cross-region networking cost, spanning across regions is the fastest way to get your production rollout up and running.

Load balancing multiple AOAI Resources

The common practice is to use Azure Application Gateway (AppGW) to perform a round-robin traffic distribution, alongside Azure DDoS service to protect your cloud telemetry. However, the rewrite rule capability of AppGW cannot rewrite the API key immediately after an AOAI resource is designated as a backend target. Therefore, a forward proxy server for each particular AOAI endpoint needs to be added to change the corresponding API key. As a result, you will need to provision three Function Apps per region to serve as the forward proxy servers. Don’t worry, these Function Apps can share the same App Service Plan.

Transforming your API with actual AOAI endpoints

From this diagram, it is clear that all Apps will direct their API calls to a single AppGW endpoint (either via public IP or domain name). This endpoint will have a shorter URI path and an internal API key, granted by your AOAI admin, for user authentication. Access control of which authenticated parties can access the Function Apps can be dynamically controlled by this internal API key. Once AppGW has distributed the incoming API requests to the different Function Apps, it will convert the API requests to the actual AOAI API requests, with the actual AOAI domain name, longer URI path, and actual API key in the AUTH header section.

shorter URI path API caller only need to provides model-name,model-api, apiversion.
Internal apiKey which is granted by AOAI admin, this apiKey is used for authentication within the AOAI service provided by you. You can dynamic control which authenticated parties can access the Function Apps by this internal apiKey.

Functional Roles of Azure Components

Application Gateway
- Public CA certificate hosting
- TLS termination
- Load balancing
- WAF and public IP restriction
Function App
- Forwarding proxy
- Change Hostname
- Rewrite URI path
- Authenticate internal apiKey
- Rewrite actual AOAI apiKey
- User Access Control
- Responsible AI Orchestration
- Health Check endpoint for AppGW
App Service Plan
- only need one per region to support multiple Function Apps

Responsible AI (RAI) Orchestration in your tenant

While most of enterprise customers likely opt-out of the Microsoft RAI mitigation approach (Content Filtering and Abuse Monitoring). To comply with the Azure OpenAI Code of Conduct and Terms of Use, the customer must build their own RAI infrastructure. Leveraging the above architecture pattern can give you greater control and governance over your Function Apps. For instance,

Incoming API request bodies without a 'User' or unregistered username can be rejected.

User prompts can be sent to Azure AI Content Safety for offensive content detection and filtering before reaching the AOAI resources.
The username and corresponding content can be logged in CosmosDB if the prompt is non-compliant.

Function App Configuration

Create your first function in the Azure portal

After Function App is created, Left blade > Configuration > Add Applilcation Settings > Save

    AOAI_HOSTNAME = {your AOAI resource domain}.openai.azure.com
    AOAI_INAPIKEY = {your internal apiKey for authenticated user}
    AOAI_OUTAPIKEY = {actual AOAI apikey}

Left blade > Health check > add path > Save

/api/FwdProxy/openai/deployments/health/check

Function App Forward Proxy Implementation

Please clone this repo into your local folder

git clone https://github.com/denlai-mshk/aoai-fwdproxy-funcapp.git

Open Visual Studio Code with this local folder
Install - extension: Azure Function, Azure Account
Left blade , click Azure icon > Workspace > mouse over the deploy icon

SSO your Microsoft account, select Azure subscription and Function App you just created.

Application Gateway Configuration

Create AppGW by Portal
Backend pool and Backend setting
You have to create 1 Routing rule bind with 1 Listener, 1 Backend pool and 1 Backend setting, the backend setting bind with 1 Health probe
Add multiple Function Apps into Backend pool
Add 1 rewrite ruleset (chatcompletion_100/otherapi_101/healthcheck_102) bind with Routing rule
AppGW inbound and outbound are 443 port for TLS/SSL

chatcompletion(100)

if (server variable = uri_path) 
equal
/openai/deployments/(.*)/chat/completions

and if (server variable = request_query) 
equal
api-version=(.*)

then
rewrite type = URL
action type = Set
Components = Both URL path and URL query string
URL path value = /api/FwdProxy/openai/deployments/{var_uri_path_1}/chatcompletions
URL query string value = api-version={var_request_query_1}

otherapi(101)

if (server variable = uri_path) 
equal
/openai/deployments/(.*)

and if (server variable = request_query) 
equal
api-version=(.*)

then
rewrite type = URL
action type = Set
Components = Both URL path and URL query string
URL path value = /api/FwdProxy/openai/deployments/{var_uri_path_1}
URL query string value = api-version={var_request_query_1}

healthcheck(102)

if (server variable = uri_path) 
equal
/openai/deployments/health/check

and if (server variable = request_query) 
equal
api-version=(.*)

then
rewrite type = URL
action type = Set
Components = URL path
URL path value = /api/FwdProxy/openai/deployments/health/check

How to test with PostMan

Well-known API tester Postman released OpenAI API profile for free. Get that over here

In postman, pass your internal apikey in auth header

Go for Production To-Do List

Consider provisioning the Function App in VNET Injection mode for security.
Connect the AppGW and Function App within the same VNET.
Add a private endpoint to the AOAI Resources in your VNET.
Instead of using the AppGW public IP for the endpoint, consider installing a public CA certificate with domain name service.
Secure the AOAI endpoint and API key in the Azure Key Vault.
Consider provisioning the CosmosDB for abuse auditing.
Consider provisioning the Azure AI Content Safety for content filtering and detection.

freedragon/aoai-fwdproxy-funcapp