cloudyr/aws.ec2

Meta library with GCE and aws.ec2

Opened this issue ยท 7 comments

I'd like to do a library that lets you call VMs easily from your local R session (working title, computeR) that could use this library and the Google compute engine library I'm going to work on next.

The API should be R-friendly and make it easy to launch servers such as RStudio, OpenCPU and Shiny, and also let you schedule scripts, run code on big-VMs then send the result to your local console, etc.

YESSS!! Love it!

Maybe also take a look at this: https://github.com/ropenscilabs/snowball. It could probably be expanded to have a GCE back-end, which would be pretty sweet. cc @jonocarroll

snowball has a framework constructed (and I believe it works in a simple sense) but yes, it was always the goal for it to be expandable to other cloud platforms (e.g. GCE). We had a constraint of no inbound connections to the initialising machine so everything is sent off (in a "snowball") and managed by the cloud provider (e.g. an AWS instance started using aws.ec2 with instructions to collect further commands from an S3 bucket).

By all means, contribute to or fork snowball and see if it has the beginnings of what you're after.

@MarkEdmondson1234 is https://github.com/sckott/analogsea also along the lines of what you have in mind?

For quickly launching RStudio, OpenCPU, Shiny instances, or R scripts on arbitrary VMs on GCE, Amazon, openstack (e.g. NSF XSEDE) etc, I've found docker-machine to be pretty useful (e.g. http://www.carlboettiger.info/2015/12/17/docker-workflows.html); though this may be simpler than the kind of back-and-forth communication you may have in mind. Since I move back and forth between host providers a bunch, I appreciate the consistent, script-able API to create, save, and destroy instances

@cboettig Nice, yes a Digital Ocean launcher would be nice too, so we cover all major platforms - is there similar for Azure?

I agree Docker looks the best way to go, the first thought for GCE was to use the docker ready images. I essentially want to build the setup I blogged about here available in a couple of lines of R, giving sensible defaults but customisable.

If Docker is usable for all the platforms then Kubernetes looks like a way to launch multiple pods in the same syntax, then use snowball for sending and collecting the threads.

Syntax wise I was hoping for something like below, that I need for my work at the very least:

## auth keys, platform choice etc. all set in .Renviron
> library(computeR)
> vm <- get_vm("rstudio")
RStudio VM running at 134.65.77.66

## or use your own custom docker image 
> vm2 <- create_vm(docker = "rocker/hadleyverse")

> f <- function(input) head(input)
> vm_result <- do_vm_function(input = mtcars, func = f, vm = vm2)

> schedule_task <- do_vm_function(input = mtcars, func = f, vm = vm2, output = "gce-disk", schedule = "daily")

## other functions
> upload_vm_file()
> shutdown_vm()
> startup_vm()

How to handle uploading and caching needed R packages on the VMs that are outside the docker image? packrat?

First draft I hope to have GCE with single instances, based off googleComputeR() thats to be made out from googlecomputev1.auto
, with I hope generic enough syntax it can be ported to the other platforms.

I'll start up a new Github repository in cloudyr to handle the features and issues if thats ok?

Azure has an R library in progress here https://github.com/Microsoft/AzureSMR

Might be worth thinking about - and formally writing up - something about a common interface so that people could write other backends for it that could just be dropped in.

Update on this, I've started putting together the GCE library here:
https://github.com/MarkEdmondson1234/googleComputeEngineR

Once that's done will look at starting up the meta package