cloudyr/googleComputeEngineR

port 22 is not open when running `gce_vm_cluster`

randy3k opened this issue · 10 comments

Describe the bug

port 22 is not open when running gce_vm_cluster. With the latest release version of googleComputeEngineR 0.3.0.

To Reproduce

r$> library(googleComputeEngineR)
Setting scopes to https://www.googleapis.com/auth/cloud-platform
Successfully auto-authenticated via service_account.json
Set default project ID to '<censored>'
Set default zone to 'us-central1-a'

r$> vms <- gce_vm_cluster()
2020-06-01 01:56:44> # Creating cluster with settings: template = r-base, dynamic_image = rocker/r-parallel, wait = FALSE, predefined_type = n1-standard-1
2020-06-01 01:56:51> Operation running...
2020-06-01 01:56:57> Operation complete in 7 secs
2020-06-01 01:57:00> Operation complete in 7 secs
2020-06-01 01:57:04> Operation complete in 6 secs
2020-06-01 01:57:05> r-cluster-1 VM running
2020-06-01 01:57:06> r-cluster-2 VM running
2020-06-01 01:57:08> r-cluster-3 VM running
2020-06-01 01:57:16> Public SSH key uploaded to instance
2020-06-01 01:57:24> Public SSH key uploaded to instance
2020-06-01 01:57:32> Public SSH key uploaded to instance
2020-06-01 01:57:32> # Testing cluster:
Error: port 22 is not open for 34.69.5.250

I am pretty sure the connection is open, when I ssh to it directly

(randyimac)-gce$ ssh 34.69.5.250
The authenticity of host '34.69.5.250 (34.69.5.250)' can't be established.
ED25519 key fingerprint is SHA256:i6cPMUTAmaKg0Jy2lS/m0JwKggJN3RnSSSNF/d5bd7g.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '34.69.5.250' (ED25519) to the list of known hosts.
Randy@r-cluster-1 ~ $

Expected behavior

The command should run without error.

**Session Info

r$> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.4

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] googleComputeEngineR_0.3.0

loaded via a namespace (and not attached):
 [1] codetools_0.2-16  listenv_0.8.0     future_1.16.0     digest_0.6.25
 [5] assertthat_0.2.1  R6_2.4.1          jsonlite_1.6.1    httr_1.4.1
 [9] rlang_0.4.6       curl_4.3          fs_1.4.1          googleAuthR_1.2.1
[13] tools_3.6.3       glue_1.4.1        parallel_3.6.3    compiler_3.6.3
[17] askpass_1.1       gargle_0.4.0.9004 globals_0.12.5    memoise_1.1.0
[21] openssl_1.4.1

It can sometimes take a moment for the ssh ports to recognise, does it work connecting to one of the VMs via gce_ssh() ? The logs indicate the ssh key upload was at least successful.

Yes. gce_ssh works.

Ok then I think its the check that failed, but it is actually all working. I may need to put a longer pause in before the check. You should be able to send up parallel jobs etc using library(future)

For some reasons, I got the following error

r$> plan(cluster, workers = vms)
bash: /usr/local/bin/docker: No such file or directory

I am trying to do this

vm1 <- gce_vm("r-cluster-1")
vm2 <- gce_vm("r-cluster-2")
vm3 <- gce_vm("r-cluster-3")

vms <- list(vm1, vm2, vm3)

# need this otherwise "check_ssh_set(x) is not TRUE"
vms <- lapply(vms, function(v) gce_ssh_setup(v, key.pub = "~/.ssh/id_rsa.pub"))

plan(cluster, workers = vms)

You shouldn't need gce_ssh_setup() anymore, that should be handled by gce_cluster()

If you run gce_vm_cluster() again with the same names, does it return? If the cluster is already up it should then just return the existing VM. Then you can use its returned vms

vms <- gce_vm_cluster()
plan(cluster, workers = as.cluster(vms))

Otherwise I think you can do it how you are building it, but need to wrap it in as.cluster(vms)

plan(cluster, workers = as.cluster(vms))

I refer to this documentation https://cloudyr.github.io/googleComputeEngineR/articles/massive-parallel.html - the website is most up to date

Thanks. But I got this if I do not run gce_ssh_setup first.

> vms <- list(vm1, vm2, vm3)
> plan(cluster, workers = as.cluster(vms))
Error in as.cluster.gce_instance(X[[i]], ...) :
  check_ssh_set(x) is not TRUE

Back to the bash: /usr/local/bin/docker: No such file or directory error,

it seems that it is an issue of the following line

> makeClusterPSOCK("34.71.11.230", rscript = c("docker", "run", "--net=host", "rocker/r-parallel", "Rscript"))
bash: /usr/local/bin/docker: No such file or directory

I have a docker installed on my system, and its path is /usr/local/bin/docker.
It seems that makeClusterPSOCK was trying to resolve the path of docker https://github.com/HenrikBengtsson/future/blob/30a01ea4b3a922376549f054059325593163f917/R/makeClusterPSOCK.R#L505.

Filed a bug at future HenrikBengtsson/future#386

Thanks. But I got this if I do not run gce_ssh_setup first.

If setting up without gce_vm_cluster() then this is necessary, but if gce_vm_cluster() completes it does this step for you - I think it should complete the second time if using existing VM names and the ssh is completing manually.

I have a docker installed on my system, and its path is /usr/local/bin/docker.

The call should be calling docker on the VM which should have docker installed

The call should be calling docker on the VM which should have docker installed

The issue has been fixed upstream.