ubc-vision/image-matching-benchmark

How to make all the compute notes to use the compute1-image. #GCP slurm

Closed this issue · 2 comments

I came here because etrulls/slurm-gcp does not have a issue part.

I've found out that in compute(1-12) does not install well the colmap.
so I :

  1. install the colmap manually in compute1.
  2. follow the instruction create an image.

gcloud compute images create benchmark-compute-image-$(date '+%Y-%m-%d-%H-%M-%S') --source-disk benchmark-compute1 --source-disk-zone us-central1-b --force --family benchmark-image-family

Next I try to make all the compute notes to use this image follow the guide

Then deprecate the previous image using the web console, reboot the VMs, instantiate compute2 as in the example above, log in to this node, and check that the new binaries you installed on the permanent compute node (compute1) are visible.

but do not know how to exactly, Please help, thanks.

PS, I' ve the follow VMs:

  1. benchmark-login1, which install all the image-matching-benchmark follow the link https://github.com/etrulls/slurm-gcp
  2. benchmark-compute1, manually install colmap.
  3. benchmark-controller.

I came here because etrulls/slurm-gcp does not have a issue part.

Neither does the parent repository, it's a bit annoying (I guess it was configured that way).

Next I try to make all the compute notes to use this image follow the guide

Then deprecate the previous image using the web console, reboot the VMs, instantiate compute2 as in the example above, log in to this node, and check that the new binaries you installed on the permanent compute node (compute1) are visible.

but do not know how to exactly, Please help, thanks.

I've had several issues in the past doing this exact same thing: sometimes it works and sometimes it doesn't and I'm not sure why. I would try repeating these operations with all the VMs off. If that doesn't work, the easiest solution is to update the installation script (scripts/custom_compute_install) with whatever the changes required to get Colmap to work and re-create it from scratch.

Wish I had a better solution.

Thanks @etrulls again.

I found out I've always use root user to run the command. I used to think root is most powerful user but now I'm wrong.

Now I change the user to sa_XXXX, the default user of slurm, and reinstall follow the guide, look like everything works fine.

Thanks @etrulls !