• Aug 28 Qi Zhao: Parallelism-centric what-if and differential analyses Page

    Qi will present a paper from PLDI 2019. Paper link: https://dl.acm.org/doi/10.1145/3314221.3314621

    Abstract: This paper proposes TaskProf2, a parallelism profiler and an adviser for task parallel programs. As a parallelism profiler, TaskProf2 pinpoints regions with serialization bottlenecks, scheduling overheads, and secondary effects of execution. As an adviser, TaskProf2 identifies regions that matter in improving parallelism. To accomplish these objectives, it uses a performance model that captures series-parallel relationships between various dynamic execution fragments of tasks and includes fine-grained measurement of computation in those fragments. Using this performance model, TaskProf2’s what-if analyses identify regions that improve the parallelism of the program while considering tasking overheads. Its differential analyses perform fine-grained differencing of an oracle and the observed performance model to identify static regions experiencing secondary effects. We have used TaskProf2 to identify regions with serialization bottlenecks and secondary effects in many applications.

  • Sept 18 Zifan Nan: HISyn: Human Learning-Inspired Natural Language Programming Page

    Natural Language (NL) programming automatically synthesizes code based on inputs expressed in natural language. It has recently received lots of growing interest. Recent solutions however all require many labeled training examples for their data-driven nature. This paper proposes an NLU-driven approach, a new approach inspired by how humans learn programming. It centers around Natural Language Understanding and draws on a novel graph-based mapping algorithm, foregoing the need of large numbers of labeled examples. The resulting NL programming framework, HISyn, using no training examples, gives synthesis accuracy comparable to those by data-driven methods trained on hundreds of training numbers. HISyn meanwhile demonstrates advantages in interpretability, error diagnosis support, and cross-domain extensibility.

    This is a practice talk for FSE 2020.

  • Sept 25 Yuhang Lin: CDL: Classified Distributed Learning for Detecting Security Attacks in Containerized Applications, and Jingzhu He: HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems Page

    CDL: Classified Distributed Learning for Detecting Security Attacks in Containerized Applications
    Abstract: Containers have been widely adopted in production computing environments for its efficiency and low overhead of isolation. However, recent studies have shown that containerized applications are prone to various security attacks. Moreover, containerized applications are often highly dynamic and short-lived, which further exacerbates the problem. In this paper, we present CDL, a classified distributed learning framework to achieve efficient security attack detection for containerized applications. CDL integrates online application classification and anomaly detection to overcome the challenge of lacking sufficient training data for dynamic short-lived containers while considering diversified normal behaviors in different applications. We have implemented a prototype of CDL and evaluated it over 33 real world vulnerability attacks in 24 commonly used server applications. Our experimental results show that CDL can reduce the false positive rate from over 12% to 0.24% compared to the traditional anomaly detection scheme without aggregating training data. Compared to the distributed learning method without application classification, CDL can improve the detection rate from catching 20 out of 33 attacks to 31 out of 33 attacks before those attacks compromise the server systems. CDL is light-weight, which can complete application classification and anomaly detection within a few milliseconds.
    This is a practice talk for ACSAC 2020.

    HangFix: Automatically Fixing Software Hang Bugs for Production Cloud Systems
    Abstract: Software hang bugs are notoriously difficult to debug, which often cause serious service outages in cloud systems. In this paper, we present HangFix, a software hang bug fixing framework which can automatically fix a hang bug that is triggered and detected in production cloud environments. HangFix first leverages stack trace analysis to localize the hang function and then performs root cause pattern matching to classify hang bugs into different types based on likely root causes. Next, HangFix generates effective code patches based on the identified root cause patterns. We have implemented a prototype of HangFix and evaluated the system on 42 real-world software hang bugs in 10 commonly used cloud server applications. Our results show that HangFix can successfully fix 40 out of 42 hang bugs in seconds.
    This is a practice talk for SOCC 2020.

  • October 2nd Cong Liu (UT Dallas): Towards Timing-Predictable & Robust Autonomy in Autonomous Embedded Systems
    Abstract: Due to the recent advances in machine learning techniques and embedded systems hardware, autonomy has become a reachable goal for many embedded systems domains. Unfortunately, it is not straightforward to achieve autonomy in many safety-critical embedded systems that require predictable timing correctness, one of the most important tenets in certification required for such systems. An example is the autonomous driving system, where timeliness of computations is an essential requirement of correctness due to the interaction with the physical world. In this talk, I will give an overview on our research contributions on addressing several algorithmic and system-level challenges towards ensuring timing-predictable autonomy, and a specific illustration of our methodology on ensuring timing predictability while simultaneously optimizing power and accuracy in a DNN-driven autonomous embedded system.Bio: Cong Liu is currently an associate professor in the Department of Computer Science at the University of Texas at Dallas. His research focuses on autonomous embedded systems and real-time systems. He has received multiple best paper awards from top-tier venues including RTAS’18, RTSS’09 & RTSS'17, and INFOCOM'17. He is a recipient of the NSF CAREER Award in 2017.
  • Oct 9 Fogo Tunde-Onadele: Self-Patch: Beyond Patch Tuesday for Containerized Applications
    Abstract: Containers have become increasingly popular in distributed computing environments. However, recent studies have shown that containerized applications are susceptible to var-ious security attacks. Traditional periodically scheduled software update approaches not only become ineffective under dynamic container environments but also impose high overhead to containers. In this paper, we present Self-Patch, a new self-triggering patching framework for applications running inside containers. Self-Patch combines light-weight runtime attack detection and dynamic targeted patching to achieve more efficient and effective security protection for containerized applications. We evaluated our schemes over 31 real world vulnerability attacks in 23 commonly used server applications. Results show that Self-Patch can accurately detect and classify 81% of attacks and reduce patching overhead by up to 84%.
  • Oct 16 Onkar Patil: Symbiotic HW Cache and SW DTLB Prefetching for DRAM/NVM Hybrid Memory
    The introduction of NVDIMM memory devices has encouraged the use of DRAM/NVM based hybrid memory systems to increase the memory-per-core ratio in compute nodes and obtain possible energy and cost benefits. However, Non-Volatile Memory (NVM) is slower than DRAM in terms of read/write latency. This difference in performance will adversely affect memory-bound applications. Traditionally, data prefetching at the hardware level has been used to increase the number of cache hits to mitigate the performance degradation. However, software (SW) prefetching has not been used effectively to reduce the effects of high memory access latencies. Also, the current cache hierarchy and hardware (HW) prefetching are not optimized for a hybrid memory system.We hypothesize that HW and SW prefetching can complement each other in placing data in caches and the Data Translation Look-aside Buffer (DTLB) prior to their references, and by doing so adaptively, highly varying access latencies in a DRAM/NVM hybrid memory system are taken into account. This work contributes an adaptive SW prefetch method based on the characterization of read/write/unroll prefetch distances for NVM and DRAM. Prefetch performance is characterized via custom benchmarks based on STREAM2 specifications in a multicore MPI runtime environment and compared to the performance of the standard SW prefetch pass in GCC. Furthermore, the effects of HW prefetching on kernels executing on hybrid memory system are evaluated. Experimental results indicate that SW prefetching targeted to populate the DTLB results in up to 26% performance improvement when symbiotically used in conjunction with HW prefetching, as opposed to only HW prefetching. Based on our findings, changes to GGC’s prefetch-loop-arrays compiler pass are proposed to take advantage of DTLB prefetching in a hybrid memory system for kernels that are frequently used in HPC applications.This is a practice talk for IEEE MASCOTS 2020.
  • Oct 23 Dong Li (UC Merced): Is Big Memory Useful for HPC Applications? A Case Study with Molecular Dynamics Simulation Page

    Abstract: The big memory platform is emerging, evidenced by Intel Optane DC persistent memory-based system providing up to 9TB memory per machine and Amazon EC2 high memory instance providing up to 24 TB memory per machine. However, the impact of those big memory platforms on high performance computing (HPC) applications is largely unknown. Is the big memory platform useful for HPC applications? On the one hand, the big memory platform enables scientific simulations with larger problem scales, because of large memory capability; On the other hand, we observe that in production supercomputers, 90% of jobs utilize less than 15% of the node memory capacity, and for 90% of the time, memory utilization is less than 35%. Many computation-intensive HPC applications cannot benefit from the big memory system. In this talk, we discuss challenges and opportunities that the big memory platform brings to HPC applications. We use molecular dynamics (MD) simulation, a computation-intensive application, for study. We introduce a memoization framework (named MD-PM) that trades large memory capacity for high computation capability. Evaluating with nine realistic MD simulation problems on Optane DC PM, we show that MD-PM consistently outperforms a state-of-the-art MD simulation package LAMMPS with an average speedup of 22.96x. The big memory system has great potential to accelerate HPC applications.

    Bio: Dong Li is an associate professor at EECS at University of California, Merced. Previously, he was a research scientist at the Oak Ridge National Laboratory (ORNL), studying computer architecture and programming model for next generation supercomputer systems. Dong earned his PhD in computer science from Virginia Tech. His research focuses on high performance computing (HPC), and maintains a strong relevance to computer systems. The core theme of his research is to study how to enable scalable and efficient execution of scientific applications on increasingly complex large-scale parallel systems. Dong received a CAREER Award from U.S. National Science Foundation in 2016, and an ORNL/CSMD Distinguished Contributor Award in 2013. His paper in SC'14 was nominated as the best student paper. He is also the lead PI for NVIDIA CUDA Research Center at UC Merced. He is a review board member of IEEE Transaction on Parallel and Distributed Systems (TPDS).

  • Oct 30 3:30pm Tongping Liu (UMass): Evidence-Based Error Detection and Diagnosis
    Abstract: In-production software often contains latent bugs escaped from the software development and testing phase. Based on a recent report, software failures cost the economy $1.7 trillion losses in 2017 alone. This talk will discuss several projects that aim to detect and diagnose errors of in-production software.iReplayer proposes an in-situ record-and-replay system for multithreaded applications that could identically reproduce the original execution, but with low recording overhead. iReplayer unlocks numerous possibilities for failure diagnosis, online error remediation, and security applications. This talk will present three examples: two automatic tools for detecting buffer overflows and use-after-free bugs, and one interactive debugging tool that is integrated with GDB.Watcher is built on top of iReplayer, aiming to automatically diagnose a wide range of program failures with explicit symptoms, such as segmentation faults, assertion failures, aborts, and divide-by-zero errors. Watcher traces the transitivity of evidence within identical re-executions. It employs watchpoints to obtain the data flow and breakpoints to capture the control flow of executions separately, without requiring any manual effort. Different from existing work, Watcher can identify the complete fault propagation chain via its iterative hybrid diagnosis.

    Bio: Tongping Liu is an Assistant Professor in Dept. of Electrical and Computer Engineering at the University of Massachusetts Amherst. His research spans runtime systems, operating systems, programming languages, compilers, and distributed systems. His primary research goal is to practically improve the security, reliability and performance of parallel and distributed software. He has been awarded the 2015 Google Faculty Research Award for his work in improving the performance of multithreaded programs. More information can be seen at https://people.umass.edu/tongping/index.html.