Software techniques to reduce instruction duplication for soft errors

 

Project Members:

Muhammad Mutahir Latif (mmlatif@ncsu.edu)

Ravi Ramaseshan (rramase@ncsu.edu)

 

Project Website: http://www4.ncsu.edu/~rramase/RED714.htm

Introduction

It is widely understood that most downtime is accounted for by programming errors and administration time. However, recent work has indicated an increasing cause of downtime may stem from transient hardware errors caused by external factors, such

as cosmic rays. Moving to denser semiconductor technologies at lower voltages will cause an increase in transient errors. While error correction coding has reduced soft error rates (SERs) in memories, no such quick fix exists for logic, and all current solutions involve extra cost and a drag on performance.

Project Statement

By modifying certain programming constructs we can introduce transient fault checking without having to increase the size of the sphere of replication as suggested in previous software only fault detection techniques. We will explore further techniques to reduce instruction duplication for soft error detection and compare the performance with contemporary systems.

Project Goals

Task 1

 

The techniques discussed in EDDI pose a very serious performance overhead. Since each and every operation is duplicated through software and every variable or memory location has a redundant copy therefore the performance is severely hampered. What we propose in our project is to find programming models in C such that on remodeling them in a certain manner we can reduce the number of duplicate instructions and variables.

 

Take for example a standard for loop in C which looks something like

 This can be replaced by

.

 

By using the shift operation and keeping i as a single bit variable we can remove the duplication of i in EDDI and after each iteration we can check that only one bit is ever set in the variable i.

Task 2

 

We would then modify the Open IMPACT compiler to incorporate the techniques identified in task 1. We would evaluate the performance of the program generated by the modified Open IMPACT compiler against previous known techniques of software fault detection, using the framework developed in task 3.

Task 3

 

Once we have collected as many programming models and implemented them in Open IMPACT, we next have to test as to what would be the overhead of implementing such a scheme. We would develop a framework which would accurately perform a timing analysis and determine whether modifying the model will introduce any additional overhead. This framework would also be capable of generating random SEUs on any memory location used in the program so that we can check that the modification will be able to handle any transient fault.

 

While simulating the operations we will be assuming that we have some hardware support to help us in speeding up the error checking mechanism. For example in the above for loop example we can assume that there is a single instruction which can check that a memory location has a certain number of bits set. Our framework would provide an abstraction of this hardware support and account for the time saved due to the availability of hardware.

 

Status Reports:

 

  1. Progress Report 1
  2. Progress Report 2
  3. Project Report

 

Reference:

 

·  "Error Detection by Duplicated Instructions in Super-Scalar Processors" ,

N. Oh, P. Shirvani and E. McCluskey in IEEE Transactions on Reliability Sep. 2001

·  "Control Flow Checking by Software Signatures" ,

N. Oh, P. Shirvani and E. McCluskey in IEEE Transactions on Reliability Sep. 2001

·  "SWIFT: Software Implemented Fault Tolerance"

by George A. Reis, Jonathan Chang, Neil Vachharajani, Ram Rangan, and David I. August in Proceedings of the Third International Symposium on Code Generation and Optimization (CGO), March 2005.

·  "Compiler-guided register reliability improvement against soft errors"

by J. Yan, W. Zhang in EMSOFT'05

·  "Compiler-directed Instruction Duplication for Soft Error Detection"

by J S Hu, F Li, V Degalahal, M Kandemir, V Narayanan and M J Irwin in DATE'05