Title: A Survey of Fault Tolerance in Message Passing Systems Abstract: The Message Passing Interface (MPI) is the de-facto message-passing standard in parallel applications running large-scale distributed systems. But the standard of MPI does not cover fault tolerance. Recently, various implementations of MPI have addressed fault tolerance at several levels with different mechanisms. This survey analyzes the architecture and the fault tolerance mechanisms of several MPI implementations, such as FT-MPI, LAM/MPI, MPICH-V and OPEN-MPI.