Title: PR: Automatic, transparent runtime parallelization of the R scripting language
Abstract
Scripting languages such as R and Matlab are widely used by scientists for 
data processing. As the amount of data and the complexity of analysis tasks both 
grow, sequential data processing using these tools often becomes the bottleneck 
in scientific workflows. We describe pR, a runtime framework for automatic and 
transparent parallelization of the popular R language used in statistical 
computing.
Recognizing R's interpreted nature and computation-intensive R codes' use 
pattern, pR adopts several novel techniques:
(1) runtime whole-program dependence analysis and code transformation assisted 
with evaluation results, (2) a selective parallelizing scheme that only 
parallelizes the expensive parts of the program, namely loops and function 
calls, and (3) a master-worker scheduling and execution engine that only "outsources" 
expensive tasks to the workers. Our framework uses MPI for inter-processor 
communication and does not require any modification to either the source code or 
the underlying R implementation. Experimental results demonstrate that pR can 
exploit both task and data parallelism in a totally transparent manner and 
overall has better performance as well as scalability compared to an existing 
parallel R package that requires code modification.
กก