Project-MONAI/tutorials

The execution of `runner.sh` is timing out due to the large dataset

Opened this issue · 0 comments

Describe the bug

When I previously submitted a new tutorial via a PR #1696 , I encountered a timeout issue while verifying the code using runner.sh. Subsequently, I tried to verify other notebooks already in monai/tutorials (e.g., spleen_segmentation_3d.ipynb) using the same method and faced similar issues. However, the timeout problem with these notebooks is sometimes not due to the code itself but because the dataset size is huge, causing the download to take too much time. Therefore, I suggest adding a block in runner.sh to check the time taken for the notebook to download the dataset. Although runner.sh shows the line number of the exit code when a timeout occurs, the timeout issue may not necessarily be caused by that line of code.

To Reproduce
Execute the following commands locally
./runner.sh -t 3d_segmentation/brats_segmentation_3d.ipynb
./runner.sh -t 3d_segmentation/spleen_segmentation_3d.ipynb

Expected behavior

There is a section in runner.sh that checks for the presence of "Setup imports" and whether the block following it is for importing various required packages for the notebook.

        # if import is used, then it should have the Setup import(s) markdown
        if [[ $(${NB_TEST} verify -f "$fname" -k "(^import|[\n\r]import|^from|[\n\r]from)" --type code) == true ]]
        then
            if [[ $(${NB_TEST} verify -f "$fname" -i $((code_ind + 1)) -k "Setup import") != true ]]; then
                print_error_msg "Missing the \"Setup imports\" after the first code cell of file: $fname"
                standardized=false
            fi

            if [[ $(${NB_TEST} verify -f "$fname" -i $((code_ind + 2)) -k "print_config()" --type code) != true ]]; then
                print_error_msg "print_config() cannot be found after the \"Setup imports\" markdown cell in file: $fname"
                standardized=false
            fi
        fi

We could use a similar method to add a block of code that checks for the presence of "Download dataset" and measures the download time for the subsequent block. With this update, we can better understand the cause of the timeout when verifying notebooks using runner.sh.