Event-Driven Parameterized Jupyter Notebooks

Jupyter notebooks are prevalent in data science community to develop models, run analysis and generate reports, etc. But in many situations, a data scientist must feed varying parameters to tune the notebook to generate an optimal model. Tools like papermill makes it easy to parameterize the notebook and Argo Events makes it super easy to set up event-driven parameterized notebooks.


  1. Install Argo Workflows.

  2. Install Argo Events.

  3. Install NATS,

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/nats-deploy.yaml
  4. Port forward to NATS pod,

     kubectl -n argo-events port-forward <nats-pod-name> 4222:4222
  5. Install Minio,

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/artifact-minio.yaml
     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/minio-deploy.yaml
  6. Port forward to Minio pod,

     kubectl -n argo-events port-forward <minio-pod-name> 9000:9000


In this demo, we are going to set up an image processing pipeline using 2 notebooks. Lets consider the ArgoProj icon image


  1. The first notebook will take the clean ArgoProj logo and add noise to it.
  2. The second notebook is going to determine the similarity between clean image and the image with noise. If the match is > 80%, then model is optimal, else we need to tune the noise parameters.

Argo Demo

  1. We will set up two gateways, Webhook and Minio. The webhook gateway will listen to HTTP requests to tune the notebook to add noise to image. The notebook will store the noisy image to Minio.

  2. The minio gateway will listen to file drop events for a specific bucket. Once the noisy image is dropped into that bucket, we will run the second notebook that determines the similarity of images.

Argo Demo

  1. Create webhook event source. It consist configuration for gateway to listen for HTTP POST requests on port 12000.

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/webhook-event-source.yaml
  2. Create webhook gateway,

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/webhook-gateway.yaml
  3. Port forward to webhook gateway pod,

     kubectl -n argo-events port-forward <webhook-gateway-pod-name> 12000:12000
  4. Create webhook sensor,

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/webhook-sensor.yaml
  5. Lets inspect webhook sensor,

     apiVersion: argoproj.io/v1alpha1
     kind: Sensor
       name: webhook-sensor
         sensors.argoproj.io/sensor-controller-instanceid: argo-events
             - name: sensor
               image: argoproj/sensor:v0.13.0-rc
               imagePullPolicy: Always
           serviceAccountName: argo-events-sa
         - name: test-dep
           gatewayName: webhook-gateway
           eventName: example
           port: 9300
         - template:
             name: webhook-workflow-trigger
               group: argoproj.io
               version: v1alpha1
               resource: workflows
               operation: create
                   apiVersion: argoproj.io/v1alpha1
                   kind: Workflow
                     generateName: noisy-processor-
                     entrypoint: noisy
                         - name: filterA
                           value: "5"
                         - name: filterB
                           value: "5"
                         - name: sVSp
                           value: "0.5"
                         - name: amount
                           value: "0.004"
                       - name: noisy
                         serviceAccountName: argo-events-sa
                             - name: filterA
                             - name: filterB
                             - name: sVSp
                             - name: amount
                           image: metalgearsolid/demo-blur-argo-logo:latest
                           command: [papermill]
                           imagePullPolicy: Always
                             - name: AWS_ACCESS_KEY_ID
                               value: minio
                             - name: AWS_SECRET_ACCESS_KEY
                               value: minio123
                             - name: AWS_DEFAULT_REGION
                               value: us-east-1
                             - name: BOTO3_ENDPOINT_URL
                               value: http://minio-service.argo-events.svc:9000
                             - "noise.ipynb"
                             - "s3://output/noisy-out.ipynb"
                             - "-p"
                             - "filterA"
                             - "{{inputs.parameters.filterA}}"
                             - "-p"
                             - "filterB"
                             - "{{inputs.parameters.filterB}}"
                             - "-p"
                             - "sVSp"
                             - "{{inputs.parameters.sVSp}}"
                             - "-p"
                             - "amount"
                             - "{{inputs.parameters.amount}}"
                 - src:
                     dependencyName: test-dep
                     dataKey: body.filterA
                   dest: spec.arguments.parameters.0.value
                 - src:
                     dependencyName: test-dep
                     dataKey: body.filterB
                   dest: spec.arguments.parameters.1.value
                 - src:
                     dependencyName: test-dep
                     dataKey: body.sVSp
                   dest: spec.arguments.parameters.2.value
                 - src:
                     dependencyName: test-dep
                     dataKey: body.amount
                   dest: spec.arguments.parameters.3.value
  6. The sensor trigger is an Argo workflow that runs a jupyter notebook with papermill. It takes arguments for Guassian filter and Slat+Pepper noise in addition to S3 configuration. The event data received from HTTP POST request is made to override the arguments to workflow on the fly.

  7. Lets configurre Minio client mc,

      mc config host add minio http://localhost:9000 minio minio123
  8. Create a bucket on Minio called output.

     mc mb minio/output
  9. Create the Minio event source that makes the gateway listen to file events for output bucket,

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/minio-event-source.yaml
  10. Create Minio gateway

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/minio-gateway.yaml
  11. Create Minio sensor,

     kubectl -n argo-events apply -f https://raw.githubusercontent.com/VaibhavPage/argo-events-demo/master/minio-sensor.yaml
  12. The Minio sensor triggers an Argo workflow that determines the similarity index of the image that was put onto noisy-bucket on Minio and the original Argo logo. If the match is less than 80%, it publishes failure message on NATS subject called image-match

  13. Run a NATS subject subscriber in a separate terminal,

     go get github.com/nats-io/nats.go/
     cd examples/nats-sub
     go run main.go -s localhost:4222 image-match
  14. Now, its time to send a HTTP request to parameterize the notebook that add noise to original Argo logo and execute the image processing pipeline.

     curl -d '{"filterA":"15", "filterB": "15", "sVSp": "0.003", "amount": "0.010"}' -H "Content-Type: application/json" -X POST http://localhost:12000/example
  15. List argo workflows to list the noise-processor- and matcher-workflow-

     argo list
  16. Check the noisy image in bucket output.

  17. As soon as the matcher-workflow- completes, you will see a message on NATS subject

     `[#1] Received on [image-match]: 'FAILURE: 0.4500723255991941'`
  18. Lets change the parameters for the curl request to get >80% match,

     curl -d '{"filterA":"5", "filterB": "5", "sVSp": "0.008", "amount": "0.0008"}' -H "Content-Type: application/json" -X POST http://localhost:12000/example
  19. Still only 51% match,

     [#2] Received on [image-match]: 'FAILURE: 0.519408012194793'
  20. Lets reduce the amount of noise to 0.0008,

     curl -d '{"filterA":"5", "filterB": "5", "sVSp": "0.008", "amount": "0.0008"}' -H "Content-Type: application/json" -X POST http://localhost:12000/example
  21. You will see a success message,

     [#4] Received on [image-match]: 'SUCCESS: 0.9157008012270712'
  22. This was a simple image processing pipeline using Argo Events. You can easily set up CI pipelines, Machine Learning pipelines etc, using Argo Events and Argo Workflows.