PhD Proposal by Tong Zhou

Title: A Tensor Operator Compiler for Python

Date: Wednesday, May 10, 2023

Time: 1:30pm - 3:00pm EST

Location: 3402 Klaus, Virtual: https://gatech.zoom.us/my/tzhou80?pwd=b3EzUGRiWGY5Y2VkamZtdVJ2WTh5UT09

Tong Zhou

Ph.D. Computer Science

School of Computer Science

Georgia Institute of Technology

Committee:

Dr. Vivek Sarkar (Advisor) - School of Computer Science, Georgia Institute of Technology

Dr. Jun Shirako - School of Computer Science, Georgia Institute of Technology

Dr. Santosh Pande - School of Computer Science, Georgia Institute of Technology

Dr. Richard Vuduc - School of Computational Science, Georgia Institute of Technology

Abstract:

Kernels in machine learning, data science and scientific computing applications are often expressed as a sequence of matrix operations. These programs typically consist of a combination of element-wise (point-wise) operators such as addition and multiplication; reduction operators, such as summation; and, matrix multiplications (matmuls). Moreover, dimension broadcasting is often required, e.g., when a 2D matrix is added to a 1D vector. A common approach to implementing such kernels is to create a separate hand-optimized library implementation for each operator. Such hand-coded implementations are often backed by vendor libraries, such as cuBLAS and Intel MKL to achieve high performance on targeted hardware. One downside of this approach is that when calling a library function for each operator, intermediate results must be stored and subsequently loaded by the next operator. This overhead becomes increasingly significant for operators with low arithmetic intensity, such as point-wise reduction operators, and tall and skinny matrices, which also exhibit lower arithmetic intensities.

To enhance overall performance in these cases, a series of operators can be computed in a fused fashion (known as operator fusion, a concept akin to loop fusion) so as to reduce or eliminate loads and stores of intermediate results. However, such fused multi-operator kernels must be either created manually by performance experts, or generated automatically by a compiler. While it is possible for performance experts to manually create fused kernels for common patterns for specific hardware platforms, it becomes impractical for them to anticipate and cover all desired combinations of operator sequences and target platforms. Therefore, using a compiler-based approach to automatically generate fused kernels is a more viable and cost-effective approach. However, past compiler based efforts have either only focused on a few hand-picked application-specific combinations of operators or were too general to generate performance competitive fused kernels.

In this work, we advance the state-of-the-art of compiler-based operator fusion in several aspects. 1) We introduce an intermediate presentation: tiled index tree on which several transformations can be easily applied such as fusion, interchanging and tiling. 2) We broaden the operator fusion scope to fuse arbitrary element-wise, reduction and matmul operators based on automatic tiled index tree transformations. 3) We leverage operator-level information and propose dependence-breaking transformations to fuse otherwise unfusable operator patterns, so as to enable single-pass incremental fused computation of special patterns. 4) We extend the fusion scope to also fuse sparse operators, identify four common types of redundancies that can appear when fusing sparse and dense operators and introduce techniques to eliminate them. 5) We deal with the presence of explicit loops, and propose techniques such as loop invariant code motion, memory reusing to further improve the performance. 6) We implement the techniques in a prototype Python compiler framework -- Intrepydd and perform evaluation on both CPU and GPU using a set of representative tensor operator sequences/programs on representative input shapes. As of now, we have some preliminary implementations and results. In the proposed work, we plan to finish the implementation and evaluation.

Media

No media selected

Summary

A Tensor Operator Compiler for Python

Details

Wednesday

May 10 2023

01:30pm - 03:03pm

Location: 3402 Klaus,

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:05/04/2023
Modified By:Tatianna Richardson
Modified:05/04/2023

Mercury (Hg)