ARC: A Root Cluster for Research into Scalable Computer Systems

Official Annoucement of the ARC Cluster (local copy)
NCSU write-up on the ARC Cluster (local copy)
TechNewsDailyStory (local copy)


main system funded in part by NSF through CRI grant #0958311

Cooling door equipment and installation funded by NCSU CSC; GPUs funded in part by a grant from NCSU ETF funds, and by NVIDIA and HP donations


Old ARC Cluster

Looking for the documentation of the old ARC Cluster?

Hardware Status


READ THIS BEFORE YOU RUN ANYTHING: Running Jobs (via Slurm)

(1) Notice: Store large data files in the Beegfs file system and not your home directory. The home directory is for programs and small data files, which should not exceed 10GB altogether.

(2) Once logged into ARC, immediately obtain access to a compute node (interactively) or schedule batch jobs as shown below. Do not execute any other commands on the login node!


Hardware

1728 cores on 108 compute nodes integrated by Advanced HPC. All machines are 2-way SMPs (except for single socket AMD Rome/Milan machines) with either AMD or Intel processors (see below) and a total 16 physical cores per node.
Nodes:







Networking, Power and Cooling:

Pictures


System Status


Software

All software is 64 bit unless marked otherwise.

Obtaining an Account


Accessing the Cluster


Using OpenMP (via gcc/g++/gfortran)


Running CUDA Programs (Versions 8.0, 10.0, 11.1)


Running MPI Programs with MVAPICH2 and gcc/g++/gfortran (Default)


Running MPI Programs with Open MPI and gcc/g++/gfortran (Alternative)


Using the PGI compilers (V16.7 for CUDA 8.0 capable GPU nodes, V19.10 for all others)

(includes OpenMP and CUDA support via pragmas, even for Fortran)


Dynamic Voltage and Frequency Scaling (DVFS)


Power monitoring

Sets of three compute nodes share a power meter; in such a set, the lowest numbered node has the meter attached (either on the serial port or via USB). In addition, two individual compute nodes have power meters (with different GPUs). See this power wiring diagram to identify which nodes belong to a set. The diagram also indicates if a meter uses serial or USB for a given node. We recommend to explicitly request a reservation for all nodes in a monitored set (see srun commands with host name option). Monitoring at 1Hz is accomplished with the following software tools (on the respective nodes where meters are attached):

Virtualization with LXD (optionally with X11, VirtualBox, Docker inside)

Container virtualization support is realized via LXD. Please try to use CentOS images as they will take much less space than any other ones since only the differences to the host image need to be stored in the container. Also, do NOT deploy LXD on nodes c[0-19] as they host BeeGFS. LXD/docker has been known to lock up nodes, and if this happens on nodes c[0-19], it would affect other users on other nodes as the BeeGFS file system would not longer be operational. Finally, stop and delete images before you release a node reserved by srun!

Notice: Images are installed locally on the node you are running on. If you need identical images on multiple nodes, then write a script to create an image from scratch. You cannot simply copy images as they are in a protected directory.

X11 inside LXD:

VirtualBox inside LXD (requires X11, see above):

Docker inside of LXD:


BeeGFS


PVFS2 is being retired, please use BeeGFS instead


PAPI


likwid V5.2.0


Hadoop Map-Reduce and Spark

Simple setup of multi-node
Hadoop map-reduce with HDFS, see also free AWS setup as an alternative and the original single node and cluster setup. But follow the instructions below for ARC. Other components, e.g., YARN, can be added to the setup below as well (not covered). We'll set up a Hadoop instance with nodes cXXX and cYYY (optionally more), so you should have gotten at least 2 nodes with srun. To get rid of ssh errors, you need to add a secondary node server and other optional services. This is not required, it's an option.

You can also run Spark on top of Hadoop as follows, which will also default to the HDFS file system:

export SPARK_DIST_CLASSPATH=$(hadoop classpath)
export SPARK_HOME=/usr/local/spark
export CLASSPATH="$CLASSPATH:$SPARK_HOME/lib/*"
export PATH="$PATH:$SPARK_HOME/bin"
run-example SparkPi 10

Tensorflow (2.4)


PyTorch


Other Packges

A number of packages have been installed, please check out their location (via: rpm -ql pkg-name) and documentation (see URLs) in this PDF if you need them. (Notice, only the mvapich2/openmpi/gnu variants are installed.) Typically, you can get access to them via:
  module avail # show which modules are available
  module load X
  export |grep X #shows what has been defined
  gcc/mpicc -I${X_INC} -L{X_LIB} -lx #for a library
  ./X #for a tool/program, may be some variant of 'X' depending on toolkit
  module switch X Y #for mutually exclusive modules if X is already loaded
  module unload X
  module info #learn how to use modules
Current list of available modules:
-------------------- /opt/ohpc/pub/moduledeps/gnu-mvapich2 ---------------------
   adios/1.10.0    mpiP/3.4.1              petsc/3.7.0        scorep/3.0
   boost/1.61.0    mumps/5.0.2             phdf5/1.8.17       sionlib/1.7.0
   fftw/3.3.4      netcdf/4.4.1            scalapack/2.0.2    superlu_dist/4.2
   hypre/2.10.1    netcdf-cxx/4.2.1        scalasca/2.3.1     tau/2.26
   imb/4.1         netcdf-fortran/4.4.4    scipy/0.18.0       trilinos/12.6.4

------------------------- /opt/ohpc/pub/moduledeps/gnu -------------------------
   R_base/3.3.1    metis/5.1.0         ocr/1.0.1          pdtoolkit/3.22
   gsl/2.2.1       mvapich2/2.2 (L)    openblas/0.2.19    superlu/5.2.1
   hdf5/1.8.17     numpy/1.11.1        openmpi/1.10.4

------------------------- /opt/ohpc/admin/modulefiles --------------------------
   spack/0.8.17

-------------------------- /opt/ohpc/pub/modulefiles ---------------------------
   EasyBuild/2.9.0        java                  pgi-llvm
   autotools       (L)    ohpc           (L)    pgi-nollvm
   cuda            (L)    openmpi3/3.1.4        prun/1.1
   gnu/5.4.0       (L)    papi/5.4.3            prun/1.3        (L,D)
   gnu8/8.3.0             pgi/19.10             valgrind/3.11.0


Advanced topics (pending)

For all other topics, access is restricted. Request a root password. Also, read this documentation, which is only accessible from selected NCSU labs.

This applies to:


Known Problems

Consult the FAQ. If this does not help, then please report your problem.

References:

Additional references: