As at 12 June 2023, one subscription can provision 30 AOAI resources per region, sharing the same TPM and RPM limits. For example, you can allocate 3 deployment instances of GPT-35-Turbo with 80K TPM/480 RPM each to utilize the whole TPM/RPM limits for one region.
Currently, there are four regional Cognitive services that support Azure OpenAI -EastUS, South Cental US, West Europe and France Central which allows for a maximium 120 instances of the same AOAI model can be provisoned across these four regions. This means you can achieve up to (1440x4/60) = maximium 96 request per second for your ChatGPT model. If this is still not enough to meet your production workload requirements, you can consider getting additional subscriptions to create an AOAI resources RAID.
You can apply for quota increase requests by filling out forms, but this may not be the most practical solution as it won't give you access to the additional resources right away. What if your AOAI service demand continues to grow? Utilizing your existing quotas is the more practical solution. Remember, you have a maximum of 120 model instances across 4 regions with 4 times quotas and limits for 1 subscription. If you don't have concerns about cross-region networking cost, spanning across regions is the fastest way to get your production rollout up and running.
The common practice is to use Azure Application Gateway (AppGW) to perform a round-robin traffic distribution, alongside Azure DDoS service to protect your cloud telemetry. However, the rewrite rule capability of AppGW cannot rewrite the API key immediately after an AOAI resource is designated as a backend target. Therefore, a forward proxy server for each particular AOAI endpoint needs to be added to change the corresponding API key. As a result, you will need to provision three Function Apps per region to serve as the forward proxy servers. Don’t worry, these Function Apps can share the same App Service Plan.
From this diagram, it is clear that all Apps will direct their API calls to a single AppGW endpoint (either via public IP or domain name). This endpoint will have a shorter URI path and an internal API key, granted by your AOAI admin, for user authentication. Access control of which authenticated parties can access the Function Apps can be dynamically controlled by this internal API key. Once AppGW has distributed the incoming API requests to the different Function Apps, it will convert the API requests to the actual AOAI API requests, with the actual AOAI domain name, longer URI path, and actual API key in the AUTH header section.
- shorter URI path API caller only need to provides model-name,model-api, apiversion.
- Internal apiKey which is granted by AOAI admin, this apiKey is used for authentication within the AOAI service provided by you. You can dynamic control which authenticated parties can access the Function Apps by this internal apiKey.
-
Application Gateway
- Public CA certificate hosting
- TLS termination
- Load balancing
- WAF and public IP restriction
-
Function App
- Forwarding proxy
- Change Hostname
- Rewrite URI path
- Authenticate internal apiKey
- Rewrite actual AOAI apiKey
- User Access Control
- Responsible AI Orchestration
- Health Check endpoint for AppGW
-
App Service Plan
- only need one per region to support multiple Function Apps
While most of enterprise customers likely opt-out of the Microsoft RAI mitigation approach (Content Filtering and Abuse Monitoring). To comply with the Azure OpenAI Code of Conduct and Terms of Use, the customer must build their own RAI infrastructure. Leveraging the above architecture pattern can give you greater control and governance over your Function Apps. For instance,
- Incoming API request bodies without a 'User' or unregistered username can be rejected.
- User prompts can be sent to Azure AI Content Safety for offensive content detection and filtering before reaching the AOAI resources.
- The username and corresponding content can be logged in CosmosDB if the prompt is non-compliant.
- After Function App is created, Left blade > Configuration > Add Applilcation Settings > Save
AOAI_HOSTNAME = {your AOAI resource domain}.openai.azure.com
AOAI_INAPIKEY = {your internal apiKey for authenticated user}
AOAI_OUTAPIKEY = {actual AOAI apikey}
- Left blade > Health check > add path > Save
/api/FwdProxy/openai/deployments/health/check
- Please clone this repo into your local folder
git clone https://github.com/denlai-mshk/aoai-fwdproxy-funcapp.git
- Open Visual Studio Code with this local folder
- Install - extension: Azure Function, Azure Account
- Left blade , click Azure icon > Workspace > mouse over the deploy icon
- SSO your Microsoft account, select Azure subscription and Function App you just created.
- Create AppGW by Portal
- Backend pool and Backend setting
- You have to create 1 Routing rule bind with 1 Listener, 1 Backend pool and 1 Backend setting, the backend setting bind with 1 Health probe
- Add multiple Function Apps into Backend pool
- Add 1 rewrite ruleset (chatcompletion_100/otherapi_101/healthcheck_102) bind with Routing rule
- AppGW inbound and outbound are 443 port for TLS/SSL
- chatcompletion(100)
if (server variable = uri_path) equal /openai/deployments/(.*)/chat/completions and if (server variable = request_query) equal api-version=(.*) then rewrite type = URL action type = Set Components = Both URL path and URL query string URL path value = /api/FwdProxy/openai/deployments/{var_uri_path_1}/chatcompletions URL query string value = api-version={var_request_query_1}
- otherapi(101)
if (server variable = uri_path) equal /openai/deployments/(.*) and if (server variable = request_query) equal api-version=(.*) then rewrite type = URL action type = Set Components = Both URL path and URL query string URL path value = /api/FwdProxy/openai/deployments/{var_uri_path_1} URL query string value = api-version={var_request_query_1}
- healthcheck(102)
if (server variable = uri_path) equal /openai/deployments/health/check and if (server variable = request_query) equal api-version=(.*) then rewrite type = URL action type = Set Components = URL path URL path value = /api/FwdProxy/openai/deployments/health/check
Well-known API tester Postman released OpenAI API profile for free. Get that over here
In postman, pass your internal apikey in auth header
- Consider provisioning the Function App in VNET Injection mode for security.
- Connect the AppGW and Function App within the same VNET.
- Add a private endpoint to the AOAI Resources in your VNET.
- Instead of using the AppGW public IP for the endpoint, consider installing a public CA certificate with domain name service.
- Secure the AOAI endpoint and API key in the Azure Key Vault.
- Consider provisioning the CosmosDB for abuse auditing.
- Consider provisioning the Azure AI Content Safety for content filtering and detection.