event
PhD Proposal by Zhongzhi Yu
Primary tabs
Title: Improving Large-Scale Foundation Models Via Attention-Aware Techniques
Date: Thursday, August 1st, 2024
Time: 2:00 pm - 3:00 pm ET
Location: Virtually via Zoom (https://gatech.zoom.us/j/9960405372?pwd=bzhIbVdWRkxweW9naUh0aUt4ci9WZz09)
PhD Student:
Zhongzhi Yu, School of Computer Science, Georgia Institute of Technology
Committee Members:
Dr. Yingyan (Celine) Lin (Advisor) – School of Computer Science, Georgia Institute of Technology
Dr. Chao Zhang – School of Computational Science and Engineering, Georgia Institute of Technology
Dr. Judy Hoffman – School of Interactive Computing, Georgia Institute of Technology
Dr. Pavlo Molchanov – Nvidia Corporation
Abstract:
Foundation models, which are a series of large-scale transformer models, have shown impressive performance across a diverse range of applications, from natural language processing to computer vision. The key enabler behind their success is the attention module, which controls how these models extract relationships among input tokens. However, despite the importance of the attention module, our understanding of its role during the inference and fine-tuning stages remains limited, leading to challenges such as potentially sub-optimal model performance and a lack of interpretability.
My thesis research focuses on understanding the potentially suboptimal attention distributions generated by foundation models and developing attention-aware techniques to improve their performance. The primary insight from my research is that certain high-attention tokens can negatively affect foundation model performance during both fine-tuning and inference. Building on this insight, my research presents state-of-the-art solutions to enhance the performance of foundation models, including an attention-aware data augmentation technique that enhances the data efficiency of the fine-tuning process and an attention calibration technique that improves inference accuracy.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:07/24/2024
- Modified By:Tatianna Richardson
- Modified:07/24/2024
Categories
Keywords
Target Audience