
CSC548 Project (Fall 2006)

Project Home Problem Description Proposed Approach Update (Nov 07) TimelineFinal Report
Project Title:
Enhanced Proactive Falut tolerance system for HPC with Xen virtualization


The earlier work “Proactive Fault Tolerance for HPC with xen virtualization” aims at providing a fault tolerance solution to the HPC clusters by automatically migrating the OS image from an ‘unhealthy’ node to a ‘healthy’ one. The health of a node is determined by parameters like CPU temperature, fan speed, etc,.

A daemon – Proactive Fault tolerance daemon (PFTd) runs and monitors the health of the node on a continuous basis by reading the hardware sensors using OpenIPMI. On detecting a deteriorating health, PFTd migrates whole of the OS image to a spare node and starts execution again. Xen virtualization is used to migrate the images from one node to another in a ‘live’ fashion with little overhead.

Problem Statement:

- Improve the currently implemented FT system by including a functionality to proactively pre-deploy parts of the OS image to the spare node and help cutting down cost of migration of VM.

- Instrument the Xen tools to gather more details like pages sent in each iteration, dirtying rate etc, and perform a sensitivity study of the benchmarks on the exact time at which the migration is initiated.

Proposal:         Report.pdf        (added Oct 26 '06)
Update1:         Report2.pdf      (added Nov 07 '06)
Final report:     final.pdf            (added Oct 30 '06)