Workflow for running R and RStudio Server on an AWS EC2 instance
Prerequisites:
a. Create an AWS account
b. Set IAM permissions to allow Amazon EC2 access
c. Install and configure AWS Command Line Interface (CLI)
d. Create and configure an Amazon Virtual Private Cloud (Amazon VPC)
e. Create an Amazon EC2 key pair
A security group acts as a virtual firewall for an EC2 instance to control incoming and outgoing traffic. Security groups can be created using the Amazon VPC console or using the AWS CLI.
Example security group:
Security group name: RStudio-security-group
Description: Allow SSH, HTTP, RStudio
VPC: <vpc ID>
Inbound rule 1 - SSH, Anywhere-IPv4, port 22
Inbound rule 2 - HTTP, Anywhere-IPv4, port 80
Inbound rule 3 - Custom TCP, Anywhere-IPv4, port 8787 (RStudio)
Outbound rule 1 - All traffic
*Record security group ID
Amazon EC2 provides a wide selection of instance types optimized for different uses. General purpose instances provide a balance of compute, memory and networking resources.
An Amazon Machine Image (AMI) is a basic configuration that serves as a template for an EC2 instance.
Free tier availability instance:
t2.micro - 1 vCPU, 1.0 RAM (GiB)
Free tier eligible AMI:
Ubuntu Amazon Machine Image (AMI) Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - ami-023fc89db93991d87 (64-bit (x86))
$ aws ec2 run-instances --image-id ami-023fc89db93991d87 --count 1 --instance-type t2.micro --key-name <key pair name> --security-group-ids <security group ID> --subnet-id <subnet ID> --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=RStudio}]' #name the instance 'RStudio'
Parameters:
--image-id: 'AMI catalog' in the EC2 portal
--key-name: 'Key Pairs' in the EC2 portal
--security-group-ids: 'Security Groups' in the EC2 portal
--subnet-id: 'Subnets' in VPC portal
--tag-specifications: provide an instance name, e.g., 'RStudio'
$ aws ec2 describe-instances --filters "Name=tag:Name,Values=RStudio"
*Record instance ID
*Record PublicDnsName
$ ssh -i </path/to/my-key-pair.pem> ubuntu@<my-instance-public-dns-name>
Default usernames for different AMIs are listed here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connection-prereqs.html
Tool for getting, installing, deleting, querying, and managing Linux software packages.
$ sudo apt install yum
$ sudo yum installed
Ubuntu repositories contain an outdated version of R. The most recent version of RStudio Server can be found here: https://www.rstudio.com/products/rstudio/download-server/ and the command below updated accordingly.
$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
$ sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/'
$ sudo apt update
r-base is the basic software which contains the R programming language. r-base-dev is an ubuntu package for compiling R packages and other software depending on R.
$ sudo apt -y install r-base r-base-dev
devtools, tidyverse, sparklyr, RMariaDB
Dependencies:
$ sudo apt -y install libcurl4-openssl-dev
$ sudo apt -y install libssl-dev libxml2-dev libmariadbclient-dev build-essential libcurl4-gnutls-dev
R packages:
$ sudo R -e "install.packages('RCurl', repos='http://cran.rstudio.com')"
$ sudo R -e "install.packages('devtools', repos='http://cran.rstudio.com')"
$ sudo R -e "install.packages('tidyverse')"
$ sudo R -e "install.packages('RMariaDB')"
GDebi is a package installer for Debian packages on Linux.
$ sudo apt install gdebi-core
$ wget https://download2.rstudio.org/server/bionic/amd64/rstudio-server-2022.02.3-492-amd64.deb
$ sudo gdebi -n rstudio-server-2022.02.3-492-amd64.deb
$ sudo rm rstudio-server-2022.02.3-492-amd64.deb
Add user information to login to RStudio
$ sudo adduser rstudio (username = rstudio)
Password: rstudio
Username and password set as 'rstudio' for ease of use.
$ sudo usermod -aG sudo rstudio
Reconfigure the library paths for RStudio use:
$ sudo apt -y install default-jdk
$ sudo R CMD javareconf
$ sudo chmod 777 -R /usr/local/lib/R/site-library
$ sudo rstudio-server restart
Open a web browser and enter Public DNS(IPv4) followed by the RStudio port (8787) as the URL:
<Public DNS(IPv4)>:8787
*Use the credentials (rstudio) created earlier in the workflow.
To save the installed programs and settings, an AMI can be created using the EC2 portal or the AWS CLI
Example AMI:
Name: RStudio-Ubuntu-Server-18.04-LTS-(HVM)-SSD-Volume
Description: RStudio Ubuntu Server 18.04 LTS (HVM), SSD Volume
Delete on termination: Disable (EBS volume will not be deleted on termination of the EC2 instance)
*Record AMI ID for future use