Batch AI Job fails with BFSMountError Info:Failed to install cifs-utils package
spdjudd opened this issue · 3 comments
Background:
I set up an experiment based on the recipes here, but using the linux-data-science-vm-ubuntu image on my cluster to run a scikit learn job. This had been working fine for the last week or two up to yesterday (1st Aug).
Problem:
Now when my job starts on a node it fails before running any of my code with the following:
Job state: failed ExitCode: 1
FailureDetails:
ErrorCode:BFSMountError
ErrorMessage:unable to mount blob fuse file system
Details:
Info:Failed to install cifs-utils package
I've tried recreating everything in a different subscription with the same result. Similar jobs, which mount the same blob and file shares, that run on a GPU cluster with the default VM image and a Tensorflow docker container still work ok, and I also tried an earlier version of linux-data-science-vm-ubuntu image to no avail.
Any ideas?
Here's the stderr.txt from the node - permission denied trying to install cifs-utils in job environment preparation:
2018/08/02 12:11:44 Ping Docker returned: 0xc4200764e0
2018/08/02 12:11:44 Removed all existing containers
2018/08/02 12:11:44 Unmounting previous job level file systems
2018/08/02 12:11:44 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/batchaiworkspace/treebagger_experiment/treebagger_08_02_2018_121135/config
2018/08/02 12:11:44 Version: 3.0.00573.0001 Branch: merge Commit: 24e1bd42
2018/08/02 12:11:44 Running required HostTool version, skipping auto-update
2018/08/02 12:11:44 Executing 'Copy hosttool executable' on 10.0.0.4
2018/08/02 12:11:44 Copy hosttool executable succeeded on 10.0.0.4. Output:
>>>
>>>
2018/08/02 12:11:44 Executing 'job environment preparation' on 10.0.0.4
2018/08/02 12:12:09 job environment preparation failed on 10.0.0.4. Output:
>>> 2018/08/02 12:11:44 Version: 3.0.00573.0001 Branch: merge Commit: 24e1bd42
>>> 2018/08/02 12:11:44 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/batchaiworkspace/treebagger_experiment/treebagger_08_02_2018_121135/wd
>>> 2018/08/02 12:11:44 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/batchaiworkspace/treebagger_experiment/treebagger_08_02_2018_121135/config
>>> 2018/08/02 12:11:44 Mounting job level file systems
>>> 2018/08/02 12:11:44 Creating directory /mnt/batch/tasks/shared/LS_root/jobs/batchaiworkspace/treebagger_experiment/treebagger_08_02_2018_121135/mounts
>>> 2018/08/02 12:11:44 No NFS configured
>>> 2018/08/02 12:11:44 Executing dpkg --configure -a; apt-get install -y -q --no-install-recommends cifs-utils
>>> dpkg: error: requested operation requires superuser privilege
>>> E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
>>> E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
>>> 2018/08/02 12:11:44 retrying ...
>>> dpkg: error: requested operation requires superuser privilege
>>> E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
>>> E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
>>> 2018/08/02 12:11:45 retrying ...
>>> dpkg: error: requested operation requires superuser privilege
>>> E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
>>> E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
>>> 2018/08/02 12:11:47 retrying ...
>>> dpkg: error: requested operation requires superuser privilege
>>> E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
>>> E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
>>> 2018/08/02 12:11:51 retrying ...
>>> dpkg: error: requested operation requires superuser privilege
>>> E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
>>> E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
>>> 2018/08/02 12:11:59 retrying ...
>>> dpkg: error: requested operation requires superuser privilege
>>> E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
>>> E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
>>> 2018/08/02 12:12:09 Giving up execution of dpkg --configure -a; apt-get install -y -q --no-install-recommends cifs-utils. Last error: exit status 100
>>> 2018/08/02 12:12:09 Reporting an error: InternalError - unable to mount blob fuse file system:{
>>> Info: Failed to install cifs-utils package
>>> }
>>> 2018/08/02 12:12:09 Failed to mount job level filesystems: InternalError - unable to mount blob fuse file system:{
>>> Info: Failed to install cifs-utils package
>>> }
>>>
2018/08/02 12:12:09 Failed to start the coordination task: InternalError - failed to prepare an environment for the job execution:{
Info: job environment preparation failed on 10.0.0.4.
}
2018/08/02 12:12:09 Executing 'sync mounted file systems' on 10.0.0.4
2018/08/02 12:12:09 sync mounted file systems succeeded on 10.0.0.4. Output:
>>> 2018/08/02 12:12:09 Version: 3.0.00573.0001 Branch: merge Commit: 24e1bd42
>>>
2018/08/02 12:12:09 Executing 'unmount mounted file systems' on 10.0.0.4
2018/08/02 12:12:09 unmount mounted file systems succeeded on 10.0.0.4. Output:
>>> 2018/08/02 12:12:09 Version: 3.0.00573.0001 Branch: merge Commit: 24e1bd42
>>> 2018/08/02 12:12:09 Unmounting /mnt/batch/tasks/shared/LS_root/jobs/batchaiworkspace/treebagger_experiment/treebagger_08_02_2018_121135 with 1m0s timeout
>>>
2018/08/02 12:12:09 Executing 'jobRelease task' on 10.0.0.4
2018/08/02 12:12:09 jobRelease task succeeded on 10.0.0.4. Output:
>>> 2018/08/02 12:12:09 Version: 3.0.00573.0001 Branch: merge Commit: 24e1bd42
>>> 2018/08/02 12:12:09 removing container treebagger_08_02_2018_121135 exited with 1
>>>
@spdjudd Thanks for reporting this. I assume you are using DSVM in North Europe region, aren't you? We are testing some new feature on that the region, and it appears to be a bug.
We just pushed a fix for this. Could you please resubmit your job to verify it?
@lliimsft Yes I'm using North Europe, and I've just tested and can confirm it's working again now, so thanks very much for the quick response!!