cloudyr/aws.ec2

future + aws.ec2?

Opened this issue · 1 comments

Are you guys thinking of creating as.cluster() methods for this package like you did with googleComputeEngineR? It would be pretty sweet to fire up an EC2 instance (cluster?) and be able to send code up to it with future. In full transparency I haven't actually used your package yet but have been thinking a bit about how something like this might work, especially with furrr.

I'm unsure of the technicalities involved in the difference between firing up 1 powerful EC2 instance with multiple cores and sending code up to run in parallel there VS firing up multiple EC2 instances and having the code run in parallel across them, but being able to do one or the other could be useful.

The future docs actually give one example that you might could build from.

## Launching worker on Amazon AWS EC2 running one of the
## Amazon Machine Images (AMI) provided by RStudio
## (http://www.louisaslett.com/RStudio_AMI/)

public_ip <- "1.2.3.4"
ssh_private_key_file <- "~/.ssh/my-private-aws-key.pem"

cl <- makeClusterPSOCK(

## Public IP number of EC2 instance
public_ip,

## User name (always 'ubuntu')
user = "ubuntu",

## Use private SSH key registered with AWS
rshopts = c(
"-o", "StrictHostKeyChecking=no",
"-o", "IdentitiesOnly=yes",
"-i", ssh_private_key_file
),

## Set up .libPaths() for the 'ubuntu' user and
## install future package
rscript_args = c(
"-e", shQuote("local({
p <- Sys.getenv('R_LIBS_USER')
dir.create(p, recursive = TRUE, showWarnings = FALSE)
.libPaths(p)
})"),
"-e", shQuote("install.packages('future')")
),

dryrun = TRUE
)

I'm resurrecting this package and taking over as maintainer. I definitely intend to do something like this for aws.lambda, but I can see it also making sense for EC2.