jupyter-server/jupyter-scheduler

Extend Jupyter-scheduler to create and manage jobs with multiple tasks

akshaychitneni opened this issue · 2 comments

Problem

Jupyter scheduler currently enable users to create and manage background jobs that execute a notebook file. We would like to extend current jobs to support multiple notebook tasks where each task would execute a notebook file and also allow creating dependencies between the tasks. We want to intiate the discussion to extend jupyter scheduler so users can create and manage notebooks workflows and associated runs in jupyter workspace. It would also require UX for users to easily create tasks and it associated dependencies using a DAG editor.

Proposed Solution

Tentative model:

  • DescribeJobDefinition API Response
{
    "name": "test1",
    "tags": null,
    "output_filename_template": "{{input_filename}}-{{create_time}}",
    "schedule": "0 0 * * MON-FRI",
    "timezone": "America/Los_Angeles",
    "job_definition_id": "b5c6099c-9bec-4a04-968f-37e9c23c0f9b",
    "create_time": 1701124758318,
    "update_time": 1701124758317,
    "active": true,
    "tasks": [
     {  
        "name": "task1",
        "input_filename": "Untitled1.ipynb",
        "parameters": null,
        "runtimeProperties": {},
        "runtime_environment_name": "anaconda3",
        "runtime_environment_parameters": null,
        "output_formats": [
            "ipynb",
            "html"
        ],
        "compute_type": null,
        "trigger_rule": null,
        "dependsOn": []
     },
     {  "name": "task2",
        "input_filename": "Untitled2.ipynb",
        "parameters": null,
        "runtimeProperties": {},
        "runtime_environment_name": "anaconda3",
        "runtime_environment_parameters": null,
        "output_formats": [
            "ipynb",
            "html"
        ],
        "compute_type": null,
        "trigger_rule": "all_success",
        "dependsOn": ["task1"]
     }
}

DescribeJob API Response:

{
    
    "name": "job1",
    "tags": null,
    "output_filename_template": "{{input_filename}}-{{create_time}}",
    "job_id": "27d8a6ae-47d0-4ed3-9e28-5411d21a0e03",
    "url": "/jobs/27d8a6ae-47d0-4ed3-9e28-5411d21a0e03",
    "create_time": 1696264966089,
    "update_time": 1696264968551,
    "start_time": 1696264967241,
    "end_time": 1696264968550,
    "status": "COMPLETED",
    "status_message": null,
     "tasks": [
        {
            "input_filename": "Untitled2.ipynb",
            "runtime_environment_name": "anaconda3",
            "runtime_environment_parameters": null,
            "output_formats": [
                "ipynb",
                "html"
            ],
            "parameters": null,
            "name": "task1",
            "job_files": [
                {
                    "display_name": "HTML",
                    "file_format": "html",
                    "file_path": null
                },
                {
                    "display_name": "Input",
                    "file_format": "input",
                    "file_path": null
                }
            ],
            "create_time": 1696264966089,
            "update_time": 1696264968551,
            "start_time": 1696264967241,
            "end_time": 1696264968550,
            "trigger_rule": null,
            "dependsOn": [],
            "status": "COMPLETED",
            "status_message": null,
            "downloaded": false
        },
        {
            "input_filename": "Untitled2.ipynb",
            "runtime_environment_name": "anaconda3",
            "runtime_environment_parameters": null,
            "output_formats": [
                "ipynb",
                "html"
            ],
            "parameters": null,
            "name": "task2",
            "job_files": [
                {
                    "display_name": "HTML",
                    "file_format": "html",
                    "file_path": null
                },
                {
                    "display_name": "Input",
                    "file_format": "input",
                    "file_path": null
                }
            ],
            "create_time": 1696264966089,
            "update_time": 1696264968551,
            "start_time": 1696264967241,
            "end_time": 1696264968550,
            "trigger_rule": "all_success",
            "dependsOn": ["task1"],
            "status": "COMPLETED",
            "status_message": null,
            "downloaded": false
        }
    ] 
}

Providing such an interface would allow users to extend scheduler to integrate with external orchestrators or schedulers like airflow to schedule and run notebook DAGs.

Additional context

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@akshaychitneni Thank you so much for contributing to Jupyter Scheduler! We have an issue #411 to cover multi-task jobs, opened earlier. I'm going to close this one as a duplicate, but let's keep the conversation going on the earlier issue.