ARO demo using Open Data Hub and Azure data services - Azure Data Lake and Azure Blob Storage.
- Azure Red Hat OpenShift 4 Cluster
- Admin access to OpenShift
- OpenShift CLI
- Install Open Data Hub
Download the Iris dataset.
Create a storage account with Azure Data Lake.
Create a storage principal:
- Make sure to assign the
Storage Blob Data Contributor
role to the service principal - Create a new application secret for authenticating the service principal
- Copy down the
client-id
,tenant-id
, andclient-secret
values (you will need this later)
View your account access key and copy down the storage account's connection string
(you will need this later)
Download azcopy.
Upload the Iris dataset to Azure Data Lake
Replace with your tenant-id and storage account name
azcopy login --tenant-id=<tenant-id>
azcopy make 'https://<storage-account-name>.dfs.core.windows.net/mycontainer'
azcopy copy iris.data 'https://<storage-account-name>.dfs.core.windows.net/mycontainer/sample/iris.data'
Configure anonymous access to storage container mycontainer
.
Launch JupyterHub
echo $(oc get route jupyterhub -n odh --template='http://{{.spec.host}}')
Select the s2i-spark-minimal-notebook image and spawn the server. Leave the other settings as they are.
Upload the model_pipeline.ipynb
notebook. Set the variables in the second cell where it says ### ENTER YOUR DETAILS ###
.
- Mount a secret with the env variables for the client, tenant, and client secret values
- Add Kubeflow on Tekton pipeline
- Add model validation and model update to the pipeline
- Add Spark connection to Azure Data Lake