MapReduce has emerged as a model of choice for supporting modern data-intensive applications, and is a key enabler for cloud computing. Setting up and operating a large MapReduce cluster entails careful evaluation of various design choices and run-time parameters to achieve high efficiency. However, this design space has not been explored in detail. In this talk, I will discuss a simulation approach to systematically understanding the performance of MapReduce setups. I will present MRPerf, a toolkit that captures such aspects of MapReduce setups as node, rack and network configurations, disk parameters and performance, data layout and application I/O characteristics, among others, and uses this information to predict expected application performance. I will also discuss the challenges faced in obtaining realistic traces to drive our simulations, and present tricks and tips we have used. The overall goal is to realize a tool for optimizing existing MapReduce setups as well as designing new ones.