Change scoring_uri for ClusterIP inferenceRouterServiceType

Question

Change scoring_uri for ClusterIP inferenceRouterServiceType

Opened this issue 2 years ago · 0 comments

When the ML extension is installed with ClusterIP set for inferenceRouterServiceType, the OnlineEndpoint seems to be building the scoring URI with the azureml-fe service's cluster IP, which is completely inaccessible outside the cluster if you use kubenet. This breaks the "Test" functionality on an Azure ML endpoint and the "az ml online-endpoint invoke" command, both of which try to use that internal ClusterIP.

OnlineEndpoint status

status:
  ...

  scoringUri: http://10.0.227.236/api/v1/endpoint/dev-rec-ab/score

Azure CLI Invoke and Error

$ az ml online-endpoint invoke --name  <endpoint-name> \
                        --resource-group <workspace-rg> \
                        --workspace-name <workspace-name> \
                        --deployment-name "green" \
                        --request-file "sample_request.json"

cli.azure.cli.core.azclierror: (<urllib3.connection.HTTPConnection object at 0x7fc1dfebbee0>, 'Connection to 10.0.227.236 timed out. (connect timeout=300)')

Azure ML Endpoint Details

Is there not any way to set or override this URI? I dug through the documentation, but the only relevant text I could find in regards to cluster IP and nodeport configurations are here Key considerations for AzureML extension deployment:

Type NodePort. Exposes azureml-fe on each Node's IP at a staic port. You'll be able to contact azureml-fe, from outside of cluster, by requesting :. Using NodePort also allows you to setup your own load balancing solution and SSL termination for azureml-fe.
Type ClusterIP. Exposes azureml-fe on a cluster-internal IP, and it makes azureml-fe only reachable from within the cluster. For azureml-fe to serve inference requests coming outside of cluster, it requires you to setup your own load balancing solution and SSL termination for azureml-fe.

I've configured an Istio ingress controller to pass external requests along to the azureml-fe service as my own load balancing solution, and that is working fine.

The rest of the documentation is written specifically for the LoadBalancer type, ignoring these other options. I tried setting a "scoring_uri" parameter in my endpoint.yaml, but it was ignored. I also dug through the configmaps in the azureml namespace, but saw no reference of it. I assume it is pulling it from the azureml/azureml-fe service directly. Is there really no way to override this? I want it to use the IP address I've configured on the ingress which is actually accessible, not the internal cluster IP. I'm having a hard time believing the only way to expose inference endpoints externally without crippling the earlier mentioned problems hasn't already been accounted for, but I can't seem to track down a solution.