This is the repo associated with my talk for why-R on 2020-02-18
Why R? Webinar 034 - Roel Hogervorst - Running your Rscript in the cloud
The video will stream live on youtube here and can also be found there afterwards.
The presentation is here in google slides
Description of talk
Life as member of a data team is awesome but as a lone data scientist in a small company you have to be data engineer, IT consultant and data scientist all in one. So how do you get your R scripts to run? In this talk I hope to inspire lone data scientists on all the ways you can run your Rscript in the cloud. I cover some simple and more advanced use cases for running R in the cloud. I also talk about cooperating with IT for maximum results.
Script progression of exploratory to production ready
In the talk I explain about how you get your exploratory script to a more production ready script. In from exploration to script (another markdown document in this repo) you can see a progression where I add informative logging and error quickly.
Links in the presentation
- Ordina
- Contact me, or go to this page for vacancies at Ordina
- https://en.wikipedia.org/wiki/Shadow_IT
- Why-R 005: Development pipeline for R in production - Lorenzo Braschi
- this page (you are reading right now) on github
Virtual machines
- My personal pros and cons of running locally on a VM or using one of the other services
- Azure (microsoft cloud) virtual machine, this one has several tools already installed
- AWS (amazon cloud) virtual machines (elastic compute; ec2)
- GCP (google cloud) virtual machines (Compute engine; CE)
running scripts from version control
docker resources
- docker tutorial by ropensci
- thinkR how to develop inside a docker container for easy collaboration
- Docker for the useR - talk by Noam Ross 2018
- {containerit} package (not on CRAN) for automatically creating a dockercontainer
- an example of dockerizing a script by me
- {liftr} for rmarkdown docker containers
- youtube video about containers (Scott Hanselman)
- Colin Fay's docker r reproduciblity post (Januari 2019)
Using gitlab and github
-
this repo has a gitlab script, heroku examle and github example to schedule a script
-
this repo makes use of a docker registry, first building one and later retrieving one
Mentioned packages
Contacting me
- @RoelMHogervorst
- mastodon.technology/@rmhogervorst
- github RMHogervorst
- gitlab RMHogervorst
- blog rmhogervorst.nl (also syndicated on r-bloggers)