Title: PR: Automatic, transparent runtime parallelization of the R scripting language

Abstract

Scripting languages such as R and Matlab are widely used by scientists for data processing. As the amount of data and the complexity of analysis tasks both grow, sequential data processing using these tools often becomes the bottleneck in scientific workflows. We describe pR, a runtime framework for automatic and transparent parallelization of the popular R language used in statistical computing.

Recognizing R's interpreted nature and computation-intensive R codes' use pattern, pR adopts several novel techniques:
(1) runtime whole-program dependence analysis and code transformation assisted with evaluation results, (2) a selective parallelizing scheme that only parallelizes the expensive parts of the program, namely loops and function calls, and (3) a master-worker scheduling and execution engine that only "outsources" expensive tasks to the workers. Our framework uses MPI for inter-processor communication and does not require any modification to either the source code or the underlying R implementation. Experimental results demonstrate that pR can exploit both task and data parallelism in a totally transparent manner and overall has better performance as well as scalability compared to an existing parallel R package that requires code modification.
กก