Operationalization using Microsoft R Server on single node machines and Spark clusters

Prerequisites

Please bring a wireless enabled laptop.
Download free Postman app for API development.
Make sure your machine has an ssh client with port-forwarding capability. On Mac or Linux, simply run the ssh command in a terminal window. On Windows, download plink.exe from here. Alternatively, see this page for details on the Windows shell options.
Provision a Linux CentOS Data Science VM (DSVM) on Azure Portal following these instructions.
- Make sure to provision Standard DS12_V2 type.
- IMPORTANT: For the VM user name please use remoteuser!

Connecting to the Data Science Virtual Machine on Microsoft Azure

We will provide Azure Data Science Virtual Machines (running Spark 2.0.2) for attendees to use during the tutorial. You will use your laptop to connect to your allocated virtual machine.

Connect to your DSVM
- Linux, Mac, or Windows Linux Shell: Command line to connect using ssh: Replace XXX with the public IP address of your Data Science Virtual Machine (e.g. remoteuser@13.64.107.209)
```
ssh -L localhost:8787:localhost:8787 remoteuser@XXX
```
- Windows: Command line to connect with plink.exe - run the following commands in a Windows command prompt window - replace XXX with the public IP address of your Data Science Virtual Machine (e.g. remoteuser@13.64.107.209)
```
cd directory-containing-plink.exe
.\plink.exe -L localhost:8787:localhost:8787 remoteuser@XXX
```
See this page for details on the Windows shell options. We are creating an SSH tunnel to the VM by mapping localhost:8787 on the VM to the client machine. This is the port on the VM opened to RStudio Server.
Once you are connected, become a root user on the cluster. In the SSH session, use the following command.
```
sudo su -
```
Download the course material from the git repository using the following command
```
git clone https://github.com/vapaunic/mlads2017s-mrsdeploy.git
```

Change the permissions on the custom script file and run the script. Use the following commands.

cd mlads2017s-mrsdeploy
chmod +x DSVM_Customization_Script.sh
dos2unix ./DSVM_Customization_Script.sh

./DSVM_Customization_Script.sh

After connecting via the above command lines, you can access RStudio Server by opening a web browser and typing the following URL. You will be prompted to sign in with your credentials.
```
http://localhost:8787/ 
```

Platforms & services for hands-on exercises or demos

Azure Linux DSVM (Data Science Virtual Machine)

Information on Linux DSVM: https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.linux-data-science-vm

The Linux DSVM has Spark (2.0.2) installed, as well as Yarn for job management, as well as HDFS. So, you can use the DSVM to run regular R code as well as code that run on Spark (e.g. using SparkR package). You will use DSVM as a single node Spark machine for hands-on exercises. We will provision these machines and assign them to you at the beginning of the tutorial.

vapaunic/mlads2017s-mrsdeploy

Operationalization using Microsoft R Server on single node machines and Spark clusters

Prerequisites

Connecting to the Data Science Virtual Machine on Microsoft Azure

Suggested Reading prior to tutorial date

Microsoft R Server:

R-Server Operationalization service:

SparkR (Spark 2.0.2):

RevoScaleR:

Platforms & services for hands-on exercises or demos

Azure Linux DSVM (Data Science Virtual Machine)