Your progress

Weekly outline

  • Course Information

    Room: EB2 3211

    Fridays 11a-12p

  • August 23

    No meeting.

    • August 30

      No meeting.

      • September 6

        Hui Guan

        Convolutional Neural Networks (CNN) are widely used for Deep Learning tasks. CNN pruning is an important method to adapt a large CNN model trained on general datasets to fit a more specialized task or a smaller device. The key challenge is in deciding which filters to remove in order to maximize the quality of the pruned networks while satisfying the constraints. It is time-consuming due to the enormous configuration space and the slowness of CNN training. The problem has drawn many efforts from the machine learning field, which try to reduce the set of network configurations to explore. This work tackles the problem distinctively from a programming systems perspective, trying to speed up the evaluations of the remaining configurations through computation reuse via a compiler-based framework. We empirically uncover the existence of composability in the training of a collection of pruned CNN models and point out the opportunities for computation reuse. We then propose composability-based CNN pruning and design a compression-based algorithm to efficiently identify the set of CNN layers to pre-train for maximizing their reuse benefits in CNN pruning. We further develop a compiler-based framework named Wootz, which, for an arbitrary CNN, automatically generates code that builds a Teacher-Student scheme to materialize composability-based pruning. Experiments show that network pruning enabled by Wootz shortens the state-of-art pruning process by up to 186X while producing significantly improved pruning results.

        • September 13

          Evaluating Burst Buffer Placement in HPC Systems

          Hash Khetawat

          Burst buffers (BBs) are increasingly exploited in contemporary supercomputers to bridge the performance gap between compute and storage systems. The design of BBs, particularly the placement of these devices and the underlying network topology, impacts both performance and cost. As the cost of other components such as memory and accelerators is increasing, it is becoming more important that HPC centers provision BBs tailored to their workloads.
          This work contributes a provisioning system to provide accurate, multi-tenant simulations that model realistic application and storage workloads from HPC systems. The framework aids HPC centers in modeling their workloads against multiple network and BB configurations rapidly. In experiments with our framework, we provide a comparison of representative Oak Ridge Leadership Computing Facility (OLCF) I/O workloads against multiple BB designs. We analyze the impact of these designs on latency, I/O phase lengths, contention for network and storage devices, and choice of network topology.

          • September 19

            This is on Thursday 19 September.  Same room: EB2 3211

            Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using Intel Optane DC Persistent Memory Modules

            Onkar Patil 

            Non-volatile, byte-addressable memory (NVM) has been introduced by Intel in the form of NVDIMMs named Intel®OptaneTM DC PMM. This memory module has the ability to persist the data stored in it without the need for power. This expands the memory hierarchy into a hybrid memory system due the differences in access latency and memory bandwidth from DRAM, which has been the predominant byte-addressable main memory technology. The Optane DC memory modules have up to 8x the capacity of DDR4 DRAM modules which can expand the byte-address space up to 6 TB per node. Many applications can now scale up the their problem size given such a memory system. We evaluate the capabilities of this DRAM-NVM hybrid memory system and its impact on High Performance Computing (HPC) applications. We characterize the Optane DC in comparison to DDR4 DRAM with a STREAM-like custom benchmark and measure the performance for HPC mini-apps like VPIC, SNAP, LULESH and AMG under different configurations of Optane DC PMMs. We find that Optane-only executions are slower in terms of execution time than DRAM-only and Memory-mode executions by a minimum of 2 to 16% for VPIC and maximum of 6x for LULESH.

            • September 27

              BARRIERFINDER: Recognizing Ad Hoc Barriers

              Presenter: Tao Wang


              Ad hoc synchronizations are pervasive in multi-threaded programs. Due to their diversity and complexity, understanding the enforced synchronization relationships of ad hoc synchronizations is challenging but crucial to multi-threaded program development and maintenance. Existing techniques can partially detect primitive ad hoc synchronizations, but they cannot recognize complete implementations or infer the enforced synchronization relationships. In this paper, we propose a framework to automatically identify complex ad hoc synchronizations in full and infer their synchronization relationships for barriers. We instantiate the framework with a tool called BARRIERFINDER, which features

              various techniques, including program slicing and bounded symbolic execution, to efficiently explore interleaving space of ad hoc synchronizations within multi-threaded programs for their traces. BARRIERFINDER then uses these traces to recognize ad hoc barriers. Our evaluation shows that BARRIERFINDER is both effective and efficient in recognizing ad hoc barriers automatically.

              • October 4

                Performance Bugs in Database-Backed Web Applications: A Characteristic Study with a Database-Access Perspective

                Shudi Shao 

                Database-backed web applications are prone to performance bugs related to database accesses. To develop an efficient database-backed web application, developers need a good knowledge of the performance implications of various database accesses and the interactions between the application and the backend database. However, such knowledge is often not clear to the developers during application development. Lacking such knowledge results in performance bugs related to database accesses, and these bugs can have significant performance impacts, as database accesses are usually resource-bounded involving network, I/O, or CPU-intensive computations. Even worse, these bugs cannot be optimized away by existing compiler or database optimizations.
                In this paper, we present a characteristic study of performance bugs in database-backed web applications conducted to provide a deep understanding of the root causes, fix strategies, triggering conditions, and performance impact of the studied bugs, and we specifically focus on the database-access perspective while characterizing these bugs. Such understanding is the first step for developers to better address performance bugs related to database accesses, and it can also guide researchers and tool vendors to develop effective tool support. Based on the study results, we further discuss and provide actionable suggestions that can lead to practical techniques to tackle performance bugs related to database accesses in database-backed web applications.

                • October 11

                  Fall Break

                  • October 22

                    This is on Tuesday (it replaces the usual seminar on Friday 18 Oct).

                    Squeezing Software Performance via Eliminating Wasteful Operations

                    Xu Liu

                    Tuesday October 22, 2019 09:30 AM, EB2 3211


                    Abstract: Inefficiencies abound in complex, layered software. A variety of inefficiencies show up as wasteful memory operations, such as redundant or useless memory loads and stores. Aliasing, limited optimization scopes, and insensitivity to input and execution contexts act as severe deterrents to static program analysis. Microscopic observation of whole executions at instruction-and operand-level granularity breaks down abstractions and helps recognize redundancies that masquerade in complex programs. In this talk, I will describe various wasteful memory operations, which pervasively exist in modern software packages and expose great potential for optimization. I will discuss the design of a fine-grained instrumentation-based profiling framework that identifies wasteful operations in their contexts, which guides nontrivial performance improvement. Furthermore, I will show our recent improvement to the profiling framework by abandoning instrumentation, which reduces the runtime overhead from 10x to 3% on average. I will show how our approach works for native binaries and various managed languages such as Java, yielding new performance insights for optimization.

                    Short Bio: Xu Liu is an assistant professor in the Department of Computer Science at College of William & Mary. He obtained his Ph.D. from Rice University in 2014 and joined the College of William & Mary in the same year. Prof. Liu works on building performance tools to pinpoint and optimize inefficiencies in HPC code bases. He has developed several open-source profiling tools, which are world-widely used at universities, DOE national laboratories, and industrial companies. Prof. Liu has published a number of papers in high-quality venues. His papers received Best Paper Award at SC’15, PPoPP’18, PPoPP’19, and ASPLOS’17 Highlights, as well as Distinguished Paper Award at ICSE’19. His recent ASPLOS’18 paper has been selected as ACM SIGPLAN Research Highlights in 2019 and nominated for CACM Research Highlights. Prof. Liu is the receipt of 2019 IEEE TCHPC Early Career Researchers Award for Excellence in High Performance Computing. Prof. Liu served on the program committee of conferences such as SC, PPoPP, IPDPS, CGO, HPCA, and ASPLOS.

                    Host: Frank Mueller, CSC

                    • October 25

                      Data-Driven Software Maintenance

                      Speaker: Na Meng, Virginia Tech


                      Software is widely used in almost every domain. When software applications contain defects or errors, these errors or software bugs can trigger security problems, cause financial loss, or even jeopardize human health. However, maintaining software to remove all those errors is usually challenging. This is because to resolve a software issue, developers usually spend lots of time and effort in order to comprehend programs, so that they can apply program changes consistently, completely, and correctly. When developers have insufficient domain knowledge or misunderstand the program logic, they may fail to fix the bug or their bug fixes can actually introduce new bugs.

                      In this talk, I will present our recent research that intends to bridge the gap between program complexity and developers’ programming capabilities. There are two parts in my talk. For the first part, I will introduce our empirical studies on developers’ secure coding practices. By crawling and analyzing developers’ technical discussions on the StackOverflow website, we identified various programming challenges that developers are faced when they build security functionalities. We also showed security vulnerabilities due to developers’ API misuses. Furthermore, we examined the reliability of security suggestions on StackOverflow, and revealed a worrisome reality in the software development industry. For the second part, I will present our recent tool that recommends code refactorings for developers. All our empirical studies and techniques have the potential to help developers (1) better understand program complexity and the complexity of software maintenance, and (2) improve program maintenance as well as software quality.

                      • November 4

                        This is on Monday at 4pm

                        A Personal History of Computing

                        Monday November 04, 2019 04:00 PM
                        Location: 3211, EB2 NCSU Centennial Campus

                        Abstract: Born a half-generation after the computer pioneers, I knew most of them.  This talk will sketch an early history of computers, emphasizing the personalities rather than the technology, and the parts I know from personal experience rather than uniform coverage.
                        Short Bio:

                        NC native and Duke alumnus Fred Brooks is Kenan Professor, Emeritus in the Department of Computer Science at UNC-Chapel Hill, which he founded in 1964 and chaired for twenty years. Prior to coming to UNC, Dr. Brooks worked for nine years with IBM. He was an architect of the IBM Stretch supercomputer and the Harvest cryptanalytic engine. He then served as Corporate Project Manager for the IBM System/360 mainframes, including the development of the System/360 computer family hardware and then the Operating System/360 software. His most important technical decision was to change IBM’s byte size from 6 to 8 bits, enabling lower-case characters.

                        At UNC, Dr. Brooks has conducted research in computer architecture, software engineering, and interactive 3-D computer graphics (“virtual reality”). His best-known books are The Mythical Man-Month: Essays on Software Engineering (1975, 1995), Computer Architecture: Concepts and Evolution (with G.A. Blaauw, 1997), and The Design of Design (2010). Dr. Brooks has received the U.S. National Medal of Technology and the A.M. Turing Award of the ACM.

                        Fred has cultivated a active Christian presence in the UNC community. Since 1965, he has advised Focus, the graduate chapter of InterVarsity Christian Fellowship at UNC.  He chairs the Board of the NC Study Center (“Battle House”).

                        • November 7

                          This is on Thursday at 9:30

                          High Performance Tensor Methods for Applications and Architectures

                          Jiajia Li

                          Pacific Northwest National Laboratory

                          Thursday November 07, 2019 09:30 AM

                          Location: 3211, EB2 NCSU Centennial Campus

                          Abstract: In this talk I will present novel high performance algorithmic techniques and data structures to build a scalable sparse tensor library and a benchmark suite on multicore CPUs and graphics co-processors (GPUs). A tensor could be regarded as a multiway array, generalizing matrices to more than two dimensions. When used to represent multifactor data, tensor methods can help analysts discover latent structure; this capability has found numerous applications in data modeling and mining in such domains as healthcare analytics, social networks analytics, computer vision, signal processing, and neuroscience, to name a few. Besides, sparse tensor algebra has been found useful in more applications, such as Quantum Chemistry and Deep Learning. This talk will cover my recently proposed performance-efficient and space-saving sparse tensor format (named “HiCOO”), based-on which a sparse tensor library (named “HiParTi”) and a sparse tensor benchmark suite (named “PASTA”) are built. The future directions of tensors and their influence on applications and computer architectures will be illustrated along with recent trends.

                          Short Bio: Jiajia Li is a research scientist in High Performance Computing group at Pacific Northwest National Laboratory (PNNL). She has received her Ph.D. degree from Georgia Institute of Technology in 2018. Her current research emphasizes on optimizing tensor methods especially for sparse data from diverse applications by utilizing various parallel architectures. She is an awardee of Best Student Paper Award at SC’18, Best Paper Finalist at PPoPP’19, and “A Rising Star in Computational and Data Sciences”. She has served on the technical program committee of conferences, such as PPoPP, SC, ICS, IPDPS, ICPP, HiPC, Euro-Par. In the past, she had received a Ph.D. degree from Institute of Computing Technology at Chinese Academy of Sciences, China and a B.S. degree in Computational Mathematics from Dalian University of Technology, China. Please check her website for more information: .

                          • November 14

                            This is on Thursday at 9:30

                            Learning with Distributed Systems: Adversary-Resilience

                            Seminar Date: November 14, 2019

                            Seminar Time: 9:30 AM (talk begins)

                            Seminar Place: Room 3211, EB2, NCSU Centennial Campus

                            Lili Su

                            Abstract: In this talk, I will talk about how to secure Federated Learning (FL) against adversarial faults.


                            FL is a new distributed learning paradigm proposed by Google. The goal of FL is to enable the cloud (i.e., the learner) to train a model without collecting the training data from users' mobile devices. Compared with traditional learning, FL suffers serious security issues and several practical constraints call for new security Strategies. Towards quantitative and systematic insights into the impacts of those security issues, we formulated and studied the problem of Byzantine-resilient Federated Learning. We proposed two robust learning rules that secure gradient descent against Byzantine faults. The estimation error achieved under our more recently proposed rule is order-optimal in the minimax sense.

                            Short Bio: Lili Su is a postdoc in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT, hosted by Professor Nancy Lynch. She received a Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign in 2017, supervised by Professor Nitin H. Vaidya. Her research intersects distributed systems, learning, security, and brain computing. She was among three nominees (finalist) for the Best Student Paper Award at DISC 2016, and she received the 2015 Best Student Paper Award at SSS 2015. She received UIUC's Sundaram Seshu International Student Fellowship for 2016, and was invited to participate in Rising Stars in EECS (2018). She has served on TPC for several conferences including ICDCS and ICDCN.


                            • November 22

                              Title: Vulnerability Exploit Detection Over Aggregated Container Data


                              Nowadays, Docker containers are widely adopted in the industry for deploying applications in many Information Technology (IT) contexts. However, the short lifespan of containers running dynamic workloads makes detecting security exploits a difficult task. In this paper, we present a method of training exploit detection models using data aggregated over multiple containers. Our results using an autoencoder-based model show advantages in using aggregated container data rather than single container data in terms of detection and false positive rates. In addition, our experiments show that the system can gather data from similar containerized applications and detect exploits in real time, which is applicable for real world scenarios. 

                              Yuhang Lin

                              • November 29