Exploiting Hyper-threading and MPI

CSC591c Course Project

Nikola Vouk (nvouk@ncsu.edu)

Frank Castaneda (fjcastan@ncsu.edu)

April 10, 2003

 

Goals:

 

The function of this project is to exploit the hyper-threading architecture in the Intel Xeon processor and the Experimental Linux 2.5 kernel in parallelizing the communication overhead of a work unit, while the main work thread does actual work. The idea is that the functional computational units of the processor are shared amongst the threads and the functions run in parallel. The key is to find a main work unit like a render, compression or computation that shares the processor with the necessary send/receive overhead in node-node communication.

 

 

Software Setup:

 

Redhat Linux with latest kernel 2.5 that supports hyper-threading.

PAPI for performance counter optimization

MPI for message passing that bypasses the kernel

 

 

 

ASCI Benchmark Experiment

As part of the project, we have to install an ASCI benchmark on the class cluster. We have decided to install sPPM.

 

Hardware Setup:

The target machines are IBM X232 with dual Intel 2.0 Ghz Xeon MP processors

 


Experiment Setup:

 

A Master Node that send data to a slave hyper-threaded node

 

The hyper threaded node runs a long running application like a render that can pipe-line i/o calls with communication to the master node. The i/o should be long enough for noticeable delay to occur if it had been done sequentially (Amdahlís Law).

 

The hyper-thread will be spin-locking on a global variable to keep in memory and then performs IO for the main thread when called

 

The main thread will be doing some sort of calculation work, probably from the Spec suite of benchmarks. These benchmarks allow us to target certain aspects and ALU units on the processor specifically.

Results

We will test the system in a hyper-threaded and non-hyper-threaded environment. We expect to see major improvement if there is a lot of communication overhead. The hyper-threading splits the architecture including the buffer access when doing processing. Over small runs, the overhead will not be beneficial, but with large long term renders or compressions, the pipelining benefit of sending and receiving data will yield higher performance. Our application will attempt to utilize the SIMD units to maximize the performance. The limitation will be seen in code that uses the same internal buffers. [1]

 

Questions and Concerns:

 

Update 4/24/2003

The latest update on the project is available here.

Final Report

The final report of the project is available here.

Website

 

http://www4.ncsu.edu/~nvouk/exploitinghyper.html

 

References:

  1. The IA-32 Intel Architecture Software Developerís Manual, Volume 1: Basic Architecture (Order Number 245470).
    ftp://download.intel.com/design/Pentium4/manuals/24547011.pdf

 

  1. The IA-32 Intel Architecture Software Developerís Manual, Volume 2: Instruction Set Reference (Order Number 245471).
    ftp://download.intel.com/design/Pentium4/manuals/24547111.pdf

  2. The IA-32 Intel Architecture Software Developerís Manual, Volume 3: System Programming Guide (Order Number 245472).
    ftp://download.intel.com/design/Pentium4/manuals/24547210.pdf

 

  1. D. Tullsen, S. Eggers, J. Emer, H. Levy, J. Lo, and R. Stamm, "Exploiting choice: Instruction fetch and issue on an implementable simultaneous multithreading processor," 23rd Annual International Symposium on Computer Architecture, May 1996.
    http://citeseer.nj.nec.com/cache/papers/cs/7286/http:zSzzSzwww.csrd.uiuc.eduzSz~ece412zSzpaperszSztullsen_ISCA96.pdf/tullsen96exploiting.pdf
  2. Download of performance libs
    http://www.intel.com/software/products/global/eval.htm#perflib
  3. Pentium optimized libraries
  4. http://www.intel.com/software/products/ipp/ipp30/index.htm
  5. Detailed Article on Hyper-threading in the Pentium Xeon
    http://developer.intel.com/technology/itj/2002/volume06issue01/art01_hyper/p01_abstract.htm
  6. Intel Processor Programming Manuals
    http://developer.intel.com/design/Pentium4/manuals/
  7. Pentium 4 and the G4e: architectural Comparison
    http://arstechnica.com/cpu/01q2/p4andg4e/p4andg4e-6.html
  8. IBM hyper-threading document
    https://mail.gininet.com/Redirect/www-106.ibm.com/developerworks/linux/library/l-htl/
  9. Spec C 2000 Test Suite
    http://www.specbench.org/osg/cpu2000/CINT2000/
    |