Ability to discover all running models
Opened this issue · 14 comments
Is your feature request related to a problem? Please describe.
From a tooling standpoint, we need the ability to discover all running LLM endpoints, so we can pick one and use it as an AI assistant in an IDE (using the continue.dev extension in VS Code/IntelliJ for instance)
Describe the solution you'd like
The model endpoints should be listed with at least their label,type, and api url e.g.
- label: LLama 3
- provider: ollama (or instructlab...)
- apiUrl: https://my.cluster:12345/foo/
Describe alternatives you've considered
AFAIK, there's no other way to discover running inference engines at the moment.
cc @amfred
Thank you for this comment @fbricon !
Per the original proposal document for Model Registry, this is accounted, especially for audit purposes.
As mentioned in the same document, this intent also protects against Model Registry becoming a controller of sort; which is not in scope.
We have the ServingEnvironment, InferenceService entities in the OpenAPI and mapped, for this scope.
I realize now:
- we should consider expanding the Logical Model do document also the ServingEnvironment, InferenceService entities.
Ultimately the "valid endpoints" source-of-truth depends on the serving runtime used.
In the case of Kubeflow, KServe is a default add-on and the one we did some integrations for.
I realize now:
- we could consider refactoring the reconciliation/controller for KServe in this repo to provide a sensible example like it's for CSI.
For context:
- we have a reconciliation/controller for kserve
- flow:
sequenceDiagram
actor U as UI Dashboard
participant K as Kubernetes
participant MC as ODH Model Controller
participant MR as Model Registry
U->>+MR: Retrieve indexed model version
MR-->>-U: Indexed model version
U->>K: Create InferenceService (ISVC)
Note right of U: Annotate/Label the ISVC with indexed <br/> model information, like RegisteredModel and <br/>ModelVersion IDs.
Note right of K: Here all operators/controllers in charge to deploy<br/> the model will make<br/> their actions, e.g., KServe or ModelMesh.
loop Every ISVC creation/deletion/update
K-->>+MC: Send notification
MC->>+K: Retrieve affected ISVC in the cluster
K-->>-MC: ISVC resource
MC->>+MR: Create/Update InferenceService in Model Registry
Note left of MR: InferenceService records in Model Registry<br/>are used to keep track of every deployment that<br/>occurred in the monitored Kubernetes cluster.
MR-->>-MC: InferenceService record
MC-->>-K: Update ISVC with Model Registry record ID
end
- we could refactor it to be made available in this repo, similar to the CSI example
@lampajr wdyt?
(edit: link fixup, typo fix)
we could refactor it to be made available in this repo, similar to the CSI example
@lampajr wdyt?
That could be a very useful example, we could create a custom folder that will contain a bare-minimal controller that implements that logic only.
Given that controllers/mr_inferenceservice_controller is already pretty much isolated I am not expecting too much effort
A similar requirement came in for a possible integration with Backstage. I am not sure I understood the proposal above, is there a way to solve this for the Kubeflow offering without an operator? should we deploy yet another container alongside REST Server for this, I typically would like to see something working OOB rather than configuring something explicitly by the user.
thanks for looping this @rareddy ,
The requirements are more naturally and clearly emerging recently, here is what I captured so far:
- the ability to discover presently running models, is needed to display information to a human user using the dashboards
- R1. the end user is interested to know, for a given RegisteredModel/ModelVersion, if it is currently deployed as an Inference endpoint
- R2. the end user is interested to know currently available Inference endpoints
Beyond these general requirements,
we are also further clarified that:
- the source-of-truth for Inference endpoint for Kubeflow, are the KServe's
Isvc
resources- as also described generally in the manual : https://www.kubeflow.org/docs/external-add-ons/kserve/webapp/#listing
- same for our distribution as Architecture clarified many times now (see @danielezonca's comments)
- KServe's
Isvc
resources, that we also suggest to annotate manually in our Model Registry tutorial: https://www.kubeflow.org/docs/components/model-registry/getting-started/#deploy-an-inference-endpoint:~:text=labels%3D%7B,%7D%2C - use of KServe's
Isvc
resources for this is also consistent with CSI from @lampajr
Given the Architecture proposal advanced by @ederign in KF community meeting 2024-08-06 (mailing-list post),
if you notice specifically slide 8,
what is described there is exactly what the BFF serves the purpose for,
but also as we discussed in the past.
So in conclusion, my recommendation is to tackle this capabilities in the Model Registry BFF, as that would be the most natural fit considering all the most recent directions.
wdyt @rareddy @ederign @lampajr ?
btw @ederign assuming this, what would be the best way to formalize this BFF functionality/requirment, please?
@tarilabs, you are right. Having multiple clients consume our APIs is precisely one reason we designed the BFF. Having VS Code and Backstage consuming our BFF would be awesome.
@tarilabs Currently, we are planning to 'talk' with Kubernetes only to fetch the MR endpoint. After getting the MR endpoint, the BFF will do REST calls to the Model Registry REST API to do all operations/data that are currently needed in the MR Web UI.
I want to double-check if the requirements you described can be fulfilled by Model Registry REST API. Or would the BFF be required to 'talk' with another Kubeflow project (Kserve, perhaps) to provide all data needed for them?
If MR REST API can provide all the data needed, a good starting point for our discussion would be understanding the endpoints and JSON schema needed for backstage and VS Code. Then, we can check if there is an overlap with the APIs that we are currently planning for the Web UI or if we need a new endpoint. I'm happy to implement those in the community.
If Model Registry REST API cannot fulfill those requirements, the BFF will be required to 'talk' with other Kubeflow projects; I suggest we hold a design session to discuss the implications of this for our architecture (orthogonal use cases).
Either way, I'm working towards a PR to add Open API + Swagger definition for the current APIs. I'll send something this week!
Or would the BFF be required to 'talk' with another Kubeflow project (Kserve, perhaps) to provide all data needed for them?
I just want to clarify I did not imply "talking to other projects", but Isvc
resources, as-in Kubernetes resources.
i.e.: something like kubectl get isvc
.
This is required for the R2 flow, and further to support a user story when a presently running model, to be catalog/index'd on Model Registry.
The rest sounds aligned to me, and happy to discuss live anytime!
I just had a quick call with @tarilabs, and we agree that BFF is the best option for this use case. So what we need to move forward is:
@fbricon @rareddy I believe a starting point for our discussion would be understanding the endpoints and JSON schema needed for backstage and VS Code. Then, we can check if there is an overlap with the APIs that we are currently planning for the Web UI or if we need a new endpoint.
I'm happy to implement those in the community.
@tarilabs I thought MR created InferenceService entities and with the above use of reconciling we are collating the deployment info which could then directly be exposed through MR REST API. Since we are going to do reconciler for StorageInitializer why not just use that?
I understand the BFF proposition, but thinking about how would external access to Backstage components need to deal with two different endpoints, security etc.
@tarilabs I thought MR created InferenceService entities and with the above use of reconciling we are collating the deployment info which could then directly be exposed through MR REST API.
To baseline the discussion:
- Model Registry does not create the InferenceService K8s resources, as it's not intended to be a control-plane component.
- Model Registry has analogous InferenceService logical model entries which has been intended for Auditing purposes (~"I want to know if, in the past, I've deployed Model X Version Y in the cluster").
This is because we can't easily retrive a history of past previous K8s resources in the cluster.- currently we have no way to leverage these entities here in upstream Kubeflow, since we don't have an operator for MR
With the above premised:
- I do not believe is a good idea leveraging entities intended for records/audit purposes, to be used as the fresh snapshot of the cluster
- also considering the source-of-truth for Kubeflow is the Inference endpoints from the Kubeflow dashboard, which is reflecting from the K8s
Isvc
so I would adopt the same semantic in Model Registry scope
But trying to walk in those shoes anyway, even if we exploit the auditory logical model entries for the fresh snapshot purpose, it won't solve for the requirement of knowing Models deployed which are not indexed/catalogued in Model Registry.
For these reasons, I believe the BFF approach as I mentioned in #130 (comment) is to me the most appropriate.
To me, we need:
- R1. a way to list the current fresh
Isvc
just dropped in the cluster which have the annotation linking back to RegisteredModel/ModelVersion- This answer the user question: this model x version y, is there now available an endpoint for inference I can use?
- R2. a way to list the current fresh
Isvc
just dropped in the cluster, regardless of the annotation- This answer the user question of the Kubeflow Dashboard, which is what the Backstage folks are interested to have
- This answer the user question of knowing which currently deployed do they want to link to a RegisteredModel
.
Since we are going to do reconciler for StorageInitializer why not just use that?
I'm not sure I understood this comment.
CSI is not a reconcile loop in a operator/controller.
.
I understand the BFF proposition, but thinking about how would external access to Backstage components need to deal with two different endpoints, security etc.
This is a matter of Deployment model of BFF, and if it becomes "an issue" to me this would be a blocker well beyond backstage integration worth of being resolved fully.
.
Hope these are relevant comments for considerations, and hope putting them in writing was helpful but I expect this is a conversation easier to have also in the meetings!
I'm not sure I understand who will be responsible for querying KServe's Isvc (kubectl get isvc). Will it be the model-registry, under the hood? Users? If the latter, my understanding (from discussions with @guimou), is those resources will most likely be under namespaces unlikely to be available to regular users
@fbricon this discussion is indeed to avoiding having to ask users to kubectl get isvc.
This discussion, as the comment are showing, is about implementation choices for how to do it within Model Registry scope, between what was recently presented (BFF) and previously available reconcile loop (intended for Auditing). Hope this clarifies.
@fbricon @rareddy, In short, the gist of what we are discussing is the BFF becoming the API for such services.
{VSCode/Backstage} => REST call => BFF = abstracts, coordinate and format data => {K8s resources | Model Registry APIs}
For sure, we will going to need to discuss security and other implications, but first, we need to agree if the BFF will be the 'API' for those external integrations.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
#130 (comment) this is being worked on as part of InferenceService reconciler and related tasks, last: