Integrate container service
Closed this issue · 19 comments
As a Consumer (ie person running a model), I want to be able to run my models smoothly using DLHub/Garden, get appropriate feedback if there's an error, and have trust in the service.
As a Developer, we want to accomplish this by integrating the container-service
into DLHub (and later, Garden) such that it replaces some of the backend plumbing.
Assumptions
- The
container-service
is in an acceptable state to be integrated into DLHub. It is currently still in PR in thefuncx
repo, but the branchcontainer-service
can be used to implement the DLHub integration. Ben G and Steve W will have more context on this -- globus/globus-compute#903 - We are are continuing to use the
dlhub-service
until we transition to a serverless architecture with Garden (double check with Ben B) - Ryan C has added the initial functionality for signed URLs to the
dlhub-service
, which can be found in this branch (not currently merged into main nor deployed) -- https://github.com/DLHub-Argonne/dlhub_service/tree/signed_url - The
container-service
logic of registering a function and container with Funcx corresponds to the existing logic in thedlhub-service
repo https://github.com/DLHub-Argonne/dlhub_service/blob/master/ingestion/publish_dockerize.py#L208 - Pinging https://api.dlhub.org/api/v1/publish/signed_url returns a signed url
- Previously, the DLHub SDK publication pipeline writes the servable information to Globus Search; this will be replaced by the
container-service
, and thus writing to Search needs to happen somewhere else.
Acceptance Criteria
- Given a Publisher publishes a model, the DLHub SDK uses the
container-service
for the model publication process instead of existing backend infrastructure - Given a Publisher publishes a model, there is no discernable difference to the Publisher between this implementation and the previous one
- Given a Consumer tries to run a model, the model runs quickly and the user gets feedback in the form of the executed function result, or an error
- Given a Consumer tries to run a model, if there is an error that information will be passed back to the user.
Tasks
- Deploy the updated
dlhub-service
, with thesigned-url
branch merged into main, on the DLHub Server - Update the
dlhub-sdk
to upload (ie publish) using signed URLs and call the container service - Insert the entry into Globus Search by creating an AWS Lambda function
- Add necessary Funcx
container-service
repo additions, per Ben G and Steve W
New scope: just upload directly from the SDK to Search. Data is only coming from Github repo. We just give the payload URL. No S3 bucket.
Ben G has a container service demo that can be used as a guide
Ryan C said the signed-url
branch should probably be good to go as is
What do you mean by "Data only from GitHub repo." Is it that data from the model being published should be hosted on GitHub?
I think @blaiszik meant that it is dummy data within the DLHub SDK repo, but I could be wrong. Ben could you clarify?
import sys
from time import sleep
from funcx import ContainerSpec
from funcx.sdk.client import FuncXClient
fxc = FuncXClient()
def wine_file_reader(source_url):
import pandas as pd
import io
import urllib.request
with urllib.request.urlopen(source_url) as f:
p = pd.read_csv(f, sep=",")
return p.to_dict()
container_uuid = None
if not container_uuid:
container_uuid = fxc.build_container(
ContainerSpec(
name="WineFileReader",
pip=[
"pandas"
],
python_version="3.7",
conda=[],
)
)
print(f"Building {container_uuid}")
while True:
status = fxc.get_container_build_status(container_uuid)
print(f"status is {status}")
if status in ["ready", "failed"]:
break
sleep(5)
if status != "ready":
sys.exit(-1)
print(fxc.get_container(container_uuid, container_type="docker"))
function = fxc.register_function(wine_file_reader, container_uuid=container_uuid)
print(function)
dlhub_endpoint = '86a47061-f3d9-44f0-90dc-56ddc642c000'
res = fxc.run("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",
endpoint_id=dlhub_endpoint, function_id=function)
result = None
while not result:
try:
result = fxc.get_result(res)
print(result)
except Exception as eek:
print("Oops", eek)
sleep(5)
This is fantastic, Ben. This should match really well with the SDK
Are the docs for the container service somewhere yet? I'm curious about things like "how to provide files needed for the container" and "specifying apt dependencies."
No docs really yet - if anyone had time to use this example to start a doc PR for funcX repo it would be appreciated
Thanks, Ben, this is really helpful.
WIP commit that outlines the publication process using the Container Service and the (in progress) Globus Search Write Lambda function has been made to the 176-integrate-container-service branch
We're going to meet on Friday 2/3 to go over how I've approached this
I've pushed WIP to the 176-integrate-container-service branch of dlhub_sdk. Status is blocked on the Container Service not successfully building anything, but is expected to be near-complete for publishing from repositories.
The search ingest lambda function is now getting the user's group membership--but not yet comparing it to anything--what groups are used (or do we want to be used) for write permission into the DLHub search index?
End to end works in dev funcx, writing to a dev search index (with the search ingest lambda deployed as dev). Next step is for someone else to take a look at it and/or try it out
TODO: finalize group membership/permissions with Search ingest AWS Lambda
Containers build appropriately w/ correct content, but due to bug in interaction between Container Service and funcx, it is impossible to actually run the function.
merged and released in DLHub SDK v2.0.0! 🌈