OS Lab/Cluster (Retired)

The OS lab and cluster provide unique high-end computing resources at the Department of Computer Science at the North Carolina State University valued above $80,000 at the time of installation. The cluster provides the means the execute high-end scientific codes in a dedicated mode. It provides the software tools and hardware backbone high-efficiency parallel programming in a production-like environment. The installation supports industry-standard MPI programming for message passing over a low-latency, high bandwidth Myrinet interconnect, it supports shared-memory programming over OpenMP using the Intel compiler suite, and it relies on both a local and a shared file system for its operations.

We have successfully used both the cluster and the lab in various classes (CSC 501, CSC 591C and CSC 591E). These educational efforts are unique as most departments do not have such resources locally but have to resort to external supercomputing centers instead.

We have also utilized these facilities for a number of research projects focusing on advanced parallel computation, such as CAREER, SPAN and FRPD. Most significantly, these facilities differ from those of other universities by our ability to modify the operating system kernel during experimentation. Hence, the versatility of these facilities combined with the applicability to both research and teaching make the cluster a unique educational tool.


OS Cluster Setup

Cluster nodes:
  • 16 + 1 spare Tyan S2466N
  • with dual Athlon XP 1900 (MP ready),
    64KB L1 I+D split caches, 2-way associative, 64B/line
    256KB L2 shared cache, 16-way associative, 64B/line
  • on-board 3c905 FastEther,
  • 512 MB RAM,
  • 60 GB IBM Deskstart IDE drive,
  • CD-ROM,
  • 2nd 3c905 PCI FastEther (for bonding),
  • Myrinet M3F-PCI64B-2 PCI card
  • (in mini-towers, not 1U or 2U enclosures, for historic reasons)

Cluster networking:

  • M3F-SW16M Myrinet 16 port switch, 24 port 3Com (see Setup Notes)
  • FastEther switch, 16 port FastEther Switch (for bonding)
  • FastEther switch, 24 port FastEther + 2 port GigaEther Switch (uplink)

Front View of Cluster (click to enlarge)


Lab nodes:

  • 16 + 1 spare MSI K7N266
  • with single Athlon XP 1800,
  • on-board FastEther,
  • 256 MB RAM,
  • 40 GB WD IDE / IBM Deskstar IDE,
  • CD-ROM / DVD-ROM.
  • Requires Nvidia NForce chipset drivers, also see Nvidea Linux Forum.
  • HP 2200 duplex printer
  • Networking: FastEther switch, 24 port FastEther Switch (uplink)

General issues:

  • Modifications to the cluster configuration need to be documented! Documentation should be concise and in ASCII, something like this.
  • Questions? Ask Frank Mueller, Sarah William or Gary Stelling. Gary can assign IPs for nodes if you give him a MAC address. Cluster nodes are os00-16, lab nodes are os20-35.
  • Have a look at Oscar (see link below). It seems like the most promising/mature cluster approach.

Back View of Cluster


Lab setup:
  • Check out this memory tweak and another one for Windows. Is there a way to make it work on Linux as well (without changing the BIOS)?
  • We use ext2 and NFS right now. Disk server support: SGI's XFS and NFS. Then, snapshot facilities would be nice, something like SnapFS or a more recent variant of SnapFS but I am not sure how good these attempts are. We may want to skip snapshots at this point in time. After server support, a distributed file system (like AFS) would be the next step.
  • We boot locally right now. It would be better to have PXE remote booting, DHCP (DHCP may not work due to subnet constraints right now), DNS Diskless HOWTO, see also PXE boot images
  • Serial line console: For serial line console, select the proper options from the BIOS "advanced" menu to redirect the console to COM1 (even after initial booting). COM2 serial does not work, BIOS bug.
  • We have Samba for Windows in the OS lab, something similar to this here: Samba and my smb.conf
  • We use Myricom's GM+MPICH, see Myrinet setup and BIP alternate / replacement for Myricom's software (send e-mail to BIP authors to obtain a password -- make sure BIP works with Linux 2.4.x kernels, I am not sure about that!).
  • We run with Gcc or the Intel C/C++/Fortran compiler with OpenMP support for Linux. Both work in conjunction with MPICH
  • We still need to do Linpack benchmarking, maybe we can make the Top 500 (from 1998 or so, at best :-)
  • We could support Ethernet bonding.
  • We mirror disk images using partimage in conjunction with Timor's rescue CD (customized version). Insert your CD, wait for the image to be saved or restored, reboot. It's that simple, based on free tools and include NTFS write support (partimage).

Mug Shot

Switches


General information: