awslabs/scale-out-computing-on-aws

/source/Scheduler.sh crashing on rhel8 scheduler instance--can't mount /data and /apps

Closed this issue · 3 comments

Describe the bug
Trying to mount /apps and /data (type EFS) fails on rhel8 scheduler with error message (from /root/Scheduler.sh.log)

1982 mount: /data: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.
1983 mount: /apps: bad option; for several filesystems (e.g. nfs, cifs) you might need a /sbin/mount.<type> helper program.

(I had a similar issue trying mount an FSxZ on rhel8 this week. It had nothing to do with SOCA. Maybe it's a rhel8 thing? I haven't had time to parse the error message yet. Never encountered /sbin/mount.<type> helper programs before this week.)

Also--the error message in Scheduler.sh.log before the script exits makes no sense. It should fail with a "number of attempts exceeded" message. Instead, it's failing with the message:

1996 + [[ -d /apps/soca/soca-test-2-7-5 ]]
1997 + echo '/apps/soca/soca-test-2-7-5 folder already exist. To prevent configuration overwrite, we exit the script. Please pick a different SOCA cluster name or delete the folder'
1998 /apps/soca/soca-test-2-7-5 folder already exist. To prevent configuration overwrite, we exit the script. Please pick a different SOCA cluster name or delete the folder
1999 + exit 1

(It's too close to the end of the day to dive into the Scheduler.sh code to see what's going on.)

To Reproduce
Try launching SOCA with a rhel8 instance

Expected behavior
Umm..that mount -a would succeed. (Also, an aligned error message on why the script is exiting) (Snark apparently emerges at the end of my workday)

Please complete the following information about the solution:

  • Version: 2.7.5
  • Region: eu-west-1
  • Was the solution modified from the version published on this repository? n/a
  • If the answer to the previous question was yes, are the changes available on GitHub? n/a
  • Have you checked your service quotas for the sevices this solution uses? n/a
  • Were there any errors in the CloudWatch Logs? n/a

Hello,

Is this a local modification for RHEL8 ? The installer should have rejected an attempt to use rhel8 .

RHEL8 is not supported at this time as a Scheduler BaseOS - only Compute and DCV/VDI BaseOS.

Yep--I modified the expected values to include rhel8. (Insert blushing emoji here.)

Can you point out the code for compute nodes that takes care of the mount -a issue that I had here and in my non-SOCA situation? It would save me a bunch of time.

Thanks!

Closed as discussed via email.