Running your R code on a Cloud service

Let’s say you’ve built a model in R that is larger than you can conveniently run locally, and you want to take advantage of Azure’s resources simply to run it on a larger machine. This blog explains how to provision and run an Azure virtual machine (VM) for this, using the mrsdeploy library that comes installed with Microsoft’s R Server. We will work specifically with the Unbuntu Linux version of the VM, so I you’ll need to be familiar with working with superuser privileges at the command line in Linux, and of course, familiar with R.

The fundamental architecture consists of your local machine as the client for which you create a server machine in the Cloud. You’ll set up a service on the remote machine—the one in the cloud. Once you do this, you needn’t interact directly with the remote machine; instead you issue commands to it and see the results returned at the client. This is one approach; there are many many ways this can be done in Azure, depending on your choice of language, reliance on coding, capabilities of the service, and complexity and scale of the task. A data scientist typically works first interactively to explore data on an individual machine, then puts the model thus built into production at scale, in this example, in the Cloud. The purpose of this posting is to clarify the deployment process, or as it is called, in a mouthful, operationalization. In short, using a VM running the mrsdeploy library in R Server lets you operationalize your code with little effort, at modest expense.

Alternatively, instead of setting up a service with R server, one unadvisedly could just provision a an bare virtual machine, and login into it as one would any remote machine with the manual encumbrance of having to work with multiple machines, load application software, and move data and code back and forth. But that’s what we avoid. The point of the Cloud is making large data and compute as much as possible like working on your local computer.

Deploying Microsoft R Server (MRS) on an Azure VM

Azure Marketplace offers a Linux VM (Ubuntu version 16.04) preconfigured with R Server 2016. Additionally the Linux VM with R Server comes with mrsdeploy, a new R package for establishing a remote session in a console application and for publishing and managing a web service written in R. In order to use the R Server’s deployment and operationalization features, one needs to configure R Server for operationalization after installation, to act as a deployment server and host analytic web services.

Alternately there are other Azure platforms for operationalization using R Server in the Marketplace, with other operating systems and platforms including HDInsight, Microsoft’s Hadoop offering. Or, equivalently one could use the Data Science VM available in the Marketplace, since it has a copy of R Server installed. Configuration of these platforms is similar to the example covered in this posting.

Provisioning an R Server VM, as reference in the documentation, takes a few steps that are detailed here, which consist of configuring the VM and setting up the server account to authorize remote access. To set up the server you’ll use the system account you set up as a user of the Linux machine. The server account is used for client interaction with the R Server, and should not be confused with the Linux system account. This is a major difference with the Windows version of the R Server VM that uses Active Directory services for authentication.

Provisioning a machine from the Marketplace

You will want to do the install of a Unbuntu Marketplace VM with R server preinstalled. The best way to find it on portal.azure.com is to search for “r server”:

r server in the Marketplace

r server in the Marketplace

Select the Ubuntu version. Do a conventional deployment—lets say you name yours mymrs. Take note of the mymrs-ip public address, and the mymrs-nsg network security group resources created for it since you will want to customize them.

Login to the VM using the system account you set up in the Portal, and add these aliases, one for the path to the version of the R executable, MRS (aka Revo64), and one for the mrsdeploy menu-driven administration tool.

alias rserver='/usr/local/bin/Revo64-9.0'
alias radmin='sudo /usr/local/bin/dotnet \
/usr/lib64/microsoft-deployr/9.0.1/Microsoft.DeployR.Utils.AdminUtil/Microsoft.DeployR.Utils.AdminUtil.dll'

The following are a set of steps to bring up on the VM a combined web-compute server (a “one-box” server) that can be accessed remotely.

1. Check if you can run Microsoft R Server (MRS).

Just use the alias for MRS

$ rserver
[Note a line in the banner saying
"Loading Microsoft R Server packages, ..."]

Here’s a simple test that MRS library is pre-loaded and runs. Note the MRS libraries (“rx” functions) are preloaded.

> rxSummary(formula = ~., data = iris)

2. Set up the MRS server for mrsdeploy

mrsdeploy operationalization runs two services, the web node and one or more compute nodes. In the simplest configuration, the one described here, both “nodes” are services running on same VM. Alternately, by making these separate, larger loads can be handled with one web node and one or more compute nodes.

Use the alias you created for the admin tool.

$ radmin

This utility brings up a menu

*************************************
Administration Utility (v9.0.1)
*************************************

1. Configure R Server for Operationalization
2. Set a local admin password
3. Stop and start services
4. Change service ports
5. Encrypt credentials
6. Run diagnostic tests
7. Evaluate capacity
8. Exit

Web node endpoint: **http://localhost:12800/**

Please enter an option:
1

Set the admin password:
*************

Confirm this password:
*************

Configuration for Operationalization:

A. One-box (web + compute nodes)
B. Web node
C. Compute node
D. Reset machine to default install state
E. Return to main menu

Please enter an option:
A

Success! Web node running (PID: 4172)

Success! Compute node running (PID: 4172)

At this point the setup should be complete. Running diagnostics with the admin tool can check that it is.

Run Diagnostic Tests: A. Test Configuration

Please enter an option:
6

Preparing to run diagnostics...
***********************
DIAGNOSTIC RESULTS:
***********************
Overall Health: pass

Web Node Details:
Logs: /usr/lib64/microsoft-deployr/9.0.1/Microsoft.DeployR.Server.WebAPI/logs
Available compute nodes: 1

Compute Node Details:
Health of 'http://localhost:12805/': pass
Logs: /usr/lib64/microsoft-deployr/9.0.1/Microsoft.DeployR.Server.BackEnd/logs


Authentication Details:
A local admin account was found. No other form of authentication is configured.

Database Details:
Health: pass
Type: sqlite

Code Execution Test: PASS Code: ‘y <- cumprod(c(1500, 1+(rnorm(n=25,mean=.05, sd = 1.4)/100)))’

Yes, it even tests that the MRS interpreter runs! If the web or the service had stopped the following test will complain loudly. Note the useful links to the log directories for failure details. Services can be stopped and started from selection 3 in the top level menu.

Run Diagnostic Tests: B. Raw Server Status

**********************
SERVICE STATE (raw):
**********************

Please authenticate...

Username:
admin

Password:
*************
Server:
Health: pass
Details:
    logPath: /usr/lib64/microsoft-deployr/9.0.1/Microsoft.DeployR.Server.WebAPI/logs
backends:
    Health: pass
    http://localhost:12805/:
    Health: pass
    Details:
        maxPoolSize: 80
        activeShellCount: 1
        currentPoolSize: 5
        logPath: /usr/lib64/microsoft-deployr/9.0.1/Microsoft.DeployR.Server.BackEnd/logs
database:
    Health: pass
    Details:
    type: sqlite
    name: main
    state: Open

3. Verify that the MRS server is running from the server linux prompt

The R server) webservices can also be checked by looking at the machine’s open ports, without going into the admin tool. This command reveals ports the linux machine is listening on:

$ netstat - tupln

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:29130         0.0.0.0:*               LISTEN      42527/mdsd
tcp        0      0 127.0.0.1:29131         0.0.0.0:*               LISTEN      2001/mdsd
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1265/sshd
tcp        0      0 0.0.0.0:9054            0.0.0.0:*               LISTEN      55348/Rserve
tcp        0      0 0.0.0.0:9055            0.0.0.0:*               LISTEN      55348/Rserve
tcp6       0      0 :::12805                :::*                    LISTEN      55327/dotnet
tcp6       0      0 :::22                   :::*                    LISTEN      1265/sshd
tcp6       0      0 :::12800                :::*                    LISTEN      55285/dotnet
udp        0      0 0.0.0.0:68              0.0.0.0:*                           1064/dhclient

We can see that port 12800 is active for the web service. 12805 is the compute server, running here on the same machine as the web service.

Next thing you should do is see if you can connect to the service with R server running locally, and load mrsdeploy.

4. Check the MRS server is running by login-ing in from the server itself.

Do this by running a remote mrsdeploy session from the server as localhost. This is the way one would “run MRS as R Client,” even though the full set of MRS features are available. Running MRS as both a client and a server on the same machine is possible, but I see no purpose other than to test that the web service is accessible. The sequence of steps is:

    $ rserver
    [ MRS banner...]

    > endpoint <- "localhost:12800"   # The forum shows this format for logins.
    > library(mrsdeploy)
    > remoteLogin(endpoint)
    Username: admin
    Password: *************           # The password you set in the admin tool. 

    [...]

    REMOTE> 

If authentication is failing, you can look at the tail of the system log file for the error, like this

$ cd /usr/lib64/microsoft-deployr/9.0.1/Microsoft.DeployR.Server.WebAPI/logs
$ sudo tail $(ls -t1 | head -1)   # Look at the end of the most recent logfile
... "Message":"The username doesn't belong to the admin user",...

Then, to end the remote session, the command is exit

    REMOTE> exit

5. Finish VM Configuration for remote access

Another two steps are needed before you can use the server over the network. You should set the public DNS (e.g. domain) address since the VM’s public IP address is dynamic and may change when the machine is restarted. And as a matter of security, the Azure firewall (the “network security gateway” resource) needs to be configured.

Go back to the portal.azure.com and find these resources associated with the VM: - Public DNS address - Open incoming service ports

Public IP

To set the public DNS name, go to the portal’s VM overview pane and click on the public-IP item, for instance, “mymrs-ip”:

until you get to the configuration blade:

This will send you to the mymrs-ip component where you can change the DNS label.

Network Security Group

If you don’t do this, a remote mrsdeploy login attempt will fail with a message

Error: Couldn't connect to server

since only the port 22 for ssh is allowed by default for the VM’s network security gateway. To configure remote access you’ll need to open the port the admin tool reported as the web endpoint, typically 12800. The inbound security rules’ blade is buried in the VM choices -> Network Interfaces -> Network Security Group -> Inbound Security Rules. Choose “Add” to create a custom inbound rule for TCP port 12800. The result looks like this:

Inbound Security Rules

Inbound Security Rules

Now the server is ready for use!

6. Check that the MRS server is running from another machine

You’ll need a local copy of MRS to do this. Copies are available from a few sources, including a “client side only” copy called, naturally–R Client that is licensed for free. R Client gives you all the remoting capabilities of R Server, also the same custom learning algorithms available with R Server, but unlike R Server, it is limited to datasets that fit in-memory.

The sources of R Server are several:

  • MSDN subscription downloads include R Server for diferent platforms
  • Also R Client is a free download on MSDN.
  • Microsoft SQL Server comes with R Server as an option. You can install R Server “standalone” with the SQL Server installer in addition to installing it as part of SQL Server.
  • If you have installed R Tools for Visual Studio (RTVS), the R Tools menu has an item to install R Client.
  • Of course any VM that comes with R Server will work too. Notably, the Data Science VM, which hosts an exhaustive collection of data science tools includes a copy of R Server .

To remotely login from your local machine, the MRS commands are the same as before, except use the domain name of the server from your local client:

> endpoint <- "mymrs.southcentralus.azure.com:12800'
> library(mrsdeploy)
> remoteLogin(endpoint)

If as shown, you do not include the admin account and passwords as arguments to remoteLogin the command will bring up a modal dialog asking you for them. Be advised that this dialog may be hidden and not come to the front, and you’ll have to look for it.


The server will kindly return a banner with the differences between your client and the server MRS environments. Here’s what a proper remote session returns on initiation:

Diff report between local and remote R sessions...

Warning! R version mismatch
local: R version 3.3.2 (2016-10-31)
remote: R version 3.2.3 (2015-12-10)

These R packages installed on the local machine are not on the remote R instance:

   Missing Packages
1        checkpoint
2  CompatibilityAPI
3              curl
...
23            RUnit

The versions of these installed R packages differ:

     Package   Local  Remote
1       base   3.3.2   3.2.3
...
23     utils   3.3.2   3.2.3


Your REMOTE R session is now active.
Commands:
        - pause() to switch to local session & leave remote session on hold.
        - resume() to return to remote session.
        - exit to leave (and terminate) remote session.

Once at the REMOTE> prompt you can explore the remote interpreter environment. These handy R functions let you explore the remote environment further:

Sys.getenv()    # will show the machine's OS environment variables on the server.
Sys.info()      # returns a character string with machine and user descriptions

Environment differences: Adding custom packages to the server

The comparative listing of package when you log into the remote should alert you to the need to accommodate the differences between local and remote environments. Different R versions generate this warning:

Warning! R version mismatch

Different versions will limit which packages are available for both versions.

Compatible but missing packages can be installed on the server. To be able to install packages when available packages differ, the remote session will need permission to write to one of the directories identified by .libPaths() on the remote server. This is not granted by default. If you feel comfortable with letting the remote user make modifications to the server, you could grant this permission by making this directory writable by everyone

 $ sudo chmod a+w /usr/local/lib/R/site-library/

Then to specify a library, for example, glmnet to be installed in this directory use

REMOTE> install.packages("glmnet", lib="/usr/local/lib/R/site-library")

These installations will persist from one remote session to another, and the “missing packages” warning at login will be updated correctly, although strangely, intellisense for package names always refers to the local list of packages, so will make suggestions that are unavailable at the remote.

Running batch R job on the server

Congratulations! Now you can run large R jobs on a VM in the cloud!

There are various uses for the server to take advantage of the VM, in addition to running interactively at the REMOTE> prompt. A simple case is to take advantage of the remote server to run large time-consuming jobs. For instance, this interation, to compute a regression’s leave-one-out r-squared values—

rsqr <- c()
system.time(
for (k in 1:nrow(mtcars)) {
rsqr[k] <- summary(lm(mpg ~ . , data=mtcars[-k,]))$r.squared
})
print(summary(rsqr))

—can be done the same remotely:

remoteExecute("rsqr <- c()\
system.time(\
for (k in 1:nrow(mtcars)) {\
    rsqr[k] <- summary(lm(mpg ~ . , data=mtcars[-k,]))$r.squared\
})")

We’ll need to recall the results separately, since only the last value in the remote expression output is printed:

remoteExecute("summary(rsqr)")

For larger chunks of code, you can include them in script files, and execute the file remotely by use mrsdeploy::remoteScript("myscript.R") which is simply a wrapper around mrsdeploy::remoteExecute("myscript.R", script=TRUE), where myscript.R is found in your local working directory.

Note that the the mrsdeploy library is not needed in the script running remotely. Indeed, the VM with preinstalled Microsoft R Server 2016 (version 9.0.1) for Linux (Ubuntu version 16.04) runs R version 3.2.3, which does not include the mrsdeploy library. So both library(mrsdeploy) and install.packages(“mrsdeploy") will generate an error on the remote session. If you’ve included these statements to enable your local script, be sure to remove them if you execute the script remotely, or the script will fail! If you want to use the same script in both places, a simple workaround is to avoid making the library call in the script when it runs in the remore session:

if ( Sys.info()["user"] != "rserve2" ) {
  library(mrsdeploy)
}  

The ability of mrsdeploy to execute a script remotely is just the tip of the iceberg. It also enables moving files and variables back and forth between local and remote, and most importantly, configuring R functions as production web services. This set of deployment features merits another entire blog posting.

For more information

For details about different configuration options see Configuring R Server Operationalization. Libraries as required in the Operationalization instructions are already configured on the VM.

To see what you can do with a remote session, have a look here.. And, for a general overview see this..

Go to Rserver documentation for the full API reference.