apache/mxnet

Flaky CI step1- cannot clear workspace directory

DickJC123 opened this issue · 1 comments

Description

I'm seeing CI jobs fail at the first step Recursively delete the current directory from the workspace. Log output is:

 java.nio.channels.ClosedChannelException

Seems unrelated to the specific PR.

Occurrences

https://jenkins.mxnet-ci.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-21104/5/pipeline
https://jenkins.mxnet-ci.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-21104/7/pipeline/42

What have you tried to solve it?

  1. Retry job to bypass issue.

I've noticed this happen at random as well. It seems it only happens when Jenkins has a large number of worker nodes (and thus jobs) running at the same time. Unfortunately, I haven't been able to root-cause the issue yet.