Title: PR: Automatic, transparent runtime parallelization of the R scripting language
Abstract
Scripting languages such as R and Matlab are widely used by scientists for
data processing. As the amount of data and the complexity of analysis tasks both
grow, sequential data processing using these tools often becomes the bottleneck
in scientific workflows. We describe pR, a runtime framework for automatic and
transparent parallelization of the popular R language used in statistical
computing.
Recognizing R's interpreted nature and computation-intensive R codes' use
pattern, pR adopts several novel techniques:
(1) runtime whole-program dependence analysis and code transformation assisted
with evaluation results, (2) a selective parallelizing scheme that only
parallelizes the expensive parts of the program, namely loops and function
calls, and (3) a master-worker scheduling and execution engine that only "outsources"
expensive tasks to the workers. Our framework uses MPI for inter-processor
communication and does not require any modification to either the source code or
the underlying R implementation. Experimental results demonstrate that pR can
exploit both task and data parallelism in a totally transparent manner and
overall has better performance as well as scalability compared to an existing
parallel R package that requires code modification.
กก