In one project I have been recently working with, I need to execute R scripts in Azure Batch. The computer nodes of the Azure Batch pool were provisioned with Data Science Virtual Machines which already include common R packages. However, some packages required for the R scripts, such as tidyr and rAzureBatch, are missing and need to be installed.
My first attempt to install the package through running install.packages(“tidyselect”) was failed. The error message revealed that I didn’t have write permission to install package. It actually makes sense that Azure Batch doesn’t allow the jobs submitted from the consumers to change the global environment of computer nodes. Therefore, the R packages can only be installed somewhere on the computer nodes where the submitted jobs have write permission.
After a bit research, it turns out that Azure Batch organised the submitted jobs in this structure.
Azure Batch exposes a portion of the file system on a computer note for the submitted job to access. One file directory will be created for each job and each task associated with the job. Within each task directory, a working directory, wd, will be created which provides read/write access to the task. As the task has full permission to create, update and delete content within this directory, here should be the place where we can install the R packages to. Azure Batch provides an environment variable AZ_BATCH_TASK_WORKING_DIR to specify the path of the directory of the current task.
With the knowledge of the task working directory, we can install the R package with the specified installation target directory.
When attaching the installed package, we need to explicitly specify the directory as well: