PhD Proposal by Prithayan Barua
Title: Static Compiler Analysis for balancing performance, portability and productivity for heterogeneous architectures
School of Computer Science
Georgia Institute of Technology
Date: Tuesday, April 28, 2020
Time: 2:00 pm - 3:30 pm
Dr. Vivek Sarkar (advisor), School of Computer Science, Georgia Institute of Technology
Dr. Hyesoon Kim, School of Computer Science, Georgia Institute of Technology
Dr. Tom Conte, School of Computer Science, Georgia Institute of Technology
Dr. Rich Vuduc, School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Santosh Pande, School of Computer Science, Georgia Institute of Technology
The need for heterogeneous architectures continues to increase as we approach the end of Moore’s Law, along with an increasing diversity of performance-sensitive applications. With this “extreme heterogeneity”, each platform can have a set of unique microarchitectural features best suited for a particular class of applications. At the software level, this has made the job of the programmers even harder. Different microarchitectures might need different programming abstractions, and hence various kinds of programming models have been developed to span the performance-productivity space. Furthermore, it requires a non-trivial effort to port legacy applications to new hardware.
There are two common kinds of approaches to addressing this problem. Firstly expert (“ninja”) programmers can develop custom libraries for new hardware platforms. However, the growing cross product of hardware types and applications domain makes this approach less practical as we look to the future. The second approach is to exploit compiler technologies to optimize programs for different architectures, which also includes development of compilers for domain-specific languages. Our thesis research aligns with this second approach.
In this proposal, we explore different static program analysis techniques to address performance, portability, and debugging requirements for heterogeneous architectures. We develop compiler optimizations for GPUs, FPGAs, and modern CPUs. Our first performance-related work targets the problem of porting and optimizing CPU applications for GPUs. We observed that traditional applications designed for latency optimized out-of-order pipelined CPUs do not exploit the throughput optimized in-order pipelined GPU architecture efficiently. This work includes automatic selection of thread coarsening transformations to improve the memory bandwidth utilization of a given kernel.
We also introduce new optimizing loop transformations for spatial architectures, that we developed for a domain-specific compiler (based on Halide) that generates OpenCL code for FPGAs. With respect to portability, we designed a code transformation that inserts memory transfers between a host and a device in an OpenMP program, or optimizes a given set of memory transfers. We also explore the problem of selecting unroll factors for the traditional loop unroll-jam transformation for modern CPUs.
Finally, for debugging, we introduced a new tool based on static analysis that developers can use to detect incorrect usage of OpenMP memory mapping clauses.