DLHub-Argonne/dlhub_sdk

Integrate container service

Closed this issue · 19 comments

As a Consumer (ie person running a model), I want to be able to run my models smoothly using DLHub/Garden, get appropriate feedback if there's an error, and have trust in the service.

As a Developer, we want to accomplish this by integrating the container-service into DLHub (and later, Garden) such that it replaces some of the backend plumbing.

Assumptions

  1. The container-service is in an acceptable state to be integrated into DLHub. It is currently still in PR in the funcx repo, but the branch container-service can be used to implement the DLHub integration. Ben G and Steve W will have more context on this -- globus/globus-compute#903
  2. We are are continuing to use the dlhub-service until we transition to a serverless architecture with Garden (double check with Ben B)
  3. Ryan C has added the initial functionality for signed URLs to the dlhub-service, which can be found in this branch (not currently merged into main nor deployed) -- https://github.com/DLHub-Argonne/dlhub_service/tree/signed_url
  4. The container-service logic of registering a function and container with Funcx corresponds to the existing logic in the dlhub-service repo https://github.com/DLHub-Argonne/dlhub_service/blob/master/ingestion/publish_dockerize.py#L208
  5. Pinging https://api.dlhub.org/api/v1/publish/signed_url returns a signed url
  6. Previously, the DLHub SDK publication pipeline writes the servable information to Globus Search; this will be replaced by the container-service, and thus writing to Search needs to happen somewhere else.

Acceptance Criteria

  1. Given a Publisher publishes a model, the DLHub SDK uses the container-service for the model publication process instead of existing backend infrastructure
  2. Given a Publisher publishes a model, there is no discernable difference to the Publisher between this implementation and the previous one
  3. Given a Consumer tries to run a model, the model runs quickly and the user gets feedback in the form of the executed function result, or an error
  4. Given a Consumer tries to run a model, if there is an error that information will be passed back to the user.

Tasks

  • Deploy the updated dlhub-service, with the signed-url branch merged into main, on the DLHub Server
  • Update the dlhub-sdk to upload (ie publish) using signed URLs and call the container service
  • Insert the entry into Globus Search by creating an AWS Lambda function
  • Add necessary Funcx container-service repo additions, per Ben G and Steve W

New scope: just upload directly from the SDK to Search. Data is only coming from Github repo. We just give the payload URL. No S3 bucket.

Ben G has a container service demo that can be used as a guide

Ryan C said the signed-url branch should probably be good to go as is

What do you mean by "Data only from GitHub repo." Is it that data from the model being published should be hosted on GitHub?

I think @blaiszik meant that it is dummy data within the DLHub SDK repo, but I could be wrong. Ben could you clarify?

import sys

from time import sleep

from funcx import ContainerSpec

from funcx.sdk.client import FuncXClient
fxc = FuncXClient()

def wine_file_reader(source_url):
import pandas as pd
import io
import urllib.request

with urllib.request.urlopen(source_url) as f:
    p = pd.read_csv(f, sep=",")
    return p.to_dict()

container_uuid = None
if not container_uuid:
container_uuid = fxc.build_container(
ContainerSpec(
name="WineFileReader",
pip=[
"pandas"
],
python_version="3.7",
conda=[],
)
)

print(f"Building {container_uuid}")

while True:
    status = fxc.get_container_build_status(container_uuid)
    print(f"status is {status}")
    if status in ["ready", "failed"]:
        break
    sleep(5)

if status != "ready":
    sys.exit(-1)

print(fxc.get_container(container_uuid, container_type="docker"))

function = fxc.register_function(wine_file_reader, container_uuid=container_uuid)
print(function)

dlhub_endpoint = '86a47061-f3d9-44f0-90dc-56ddc642c000'

res = fxc.run("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv",
endpoint_id=dlhub_endpoint, function_id=function)

result = None
while not result:
try:
result = fxc.get_result(res)
print(result)
except Exception as eek:
print("Oops", eek)
sleep(5)

This is fantastic, Ben. This should match really well with the SDK

Are the docs for the container service somewhere yet? I'm curious about things like "how to provide files needed for the container" and "specifying apt dependencies."

No docs really yet - if anyone had time to use this example to start a doc PR for funcX repo it would be appreciated

Thanks, Ben, this is really helpful.

WIP commit that outlines the publication process using the Container Service and the (in progress) Globus Search Write Lambda function has been made to the 176-integrate-container-service branch

We're going to meet on Friday 2/3 to go over how I've approached this

I've pushed WIP to the 176-integrate-container-service branch of dlhub_sdk. Status is blocked on the Container Service not successfully building anything, but is expected to be near-complete for publishing from repositories.

The search ingest lambda function is now getting the user's group membership--but not yet comparing it to anything--what groups are used (or do we want to be used) for write permission into the DLHub search index?

End to end works in dev funcx, writing to a dev search index (with the search ingest lambda deployed as dev). Next step is for someone else to take a look at it and/or try it out

TODO: finalize group membership/permissions with Search ingest AWS Lambda

Containers build appropriately w/ correct content, but due to bug in interaction between Container Service and funcx, it is impossible to actually run the function.

merged and released in DLHub SDK v2.0.0! 🌈