machine-learning-exchange/mlx

Usability issues with Datasets and Notebooks

Opened this issue · 2 comments

Describe the bug

It is not clear how to run notebooks and datasets? There should be a succinct documentation of ...

  • How are notebooks and datasets related?
  • From a data users can navigate to a notebook, but not vice versa?!
  • Need to describe why running a dataset asks for a namespace (-> PVC creation)
  • What happens after a Dataset has been "launched"?
  • Running a dataset with related asset (notebook) asks for PVC? How does a user get that?
  • What should the Mount Path be? (-> /tmp/data ... MLX UI should prefill)
  • Why does the pipeline run show no inputs and outputs?
  • What are the results of a Notebook run, where can they be found (Minio > HTML, original notebook updated, preview in MLX UI Notebook card -- unless cached?)
  • Download Notebook, after run, should contain updated notebook and/or HTML of last run
  • Add a Troubleshooting section, i.e. for 403 error while trying to create PVC when missing permissions to create CRDs

We should update the doc here:
https://github.com/machine-learning-exchange/mlx/blob/main/datasets/README.md#use-dataset-with-mlx-assets

Thanks @blublinsky for reporting

Add Troubleshooting for 403 error when trying to "Run" a Dataset:

  • by default, Kubeflow cannot deploy any CRD resource on the cluster
  • need to patch the cluster with:
     kubectl create clusterrolebinding pipeline-runner-extend --clusterrole cluster-admin --serviceaccount=kubeflow:pipeline-runner
    

@yhwang -- thanks for reporting that