event

PhD Defense by Zhongzhi Yu

Primary tabs

Title: Enhancing Foundation Models with Self-Guided Techniques: From Attention to Adapters to Agents

Date: Thursday, April 10th
Time: 10:00 AM – 11:30 AM (Eastern Time)
Location: Klaus 1212, Klaus Advanced Computing Building
Zoom Link: https://gatech.zoom.us/j/9960405372

Zhongzhi Yu
Ph.D. Student
School of Computer Science
Georgia Institute of Technology

Committee:
Dr. Yingyan (Celine) Lin (Advisor, School of Computer Science, Georgia Tech)
Dr. Chao Zhang (School of Computational Science & Engineering, Georgia Tech)
Dr. Haoxing (Mark) Ren (Nvidia Corporation)
Dr. Pavlo Molchanov (Nvidia Corporation)
Dr. Zsolt Kira (School of Interactive Computing, Georgia Tech)

Abstract:

Foundation models, a class of large-scale transformers pretrained on large-scale datasets, have achieved remarkable performance across various applications. However, the growing demand to deploy foundation models in real-world applications with diverse resource and capability requirements highlights three critical challenges hindering their broader adoption: (1) the accuracy-efficiency trade-off, where improving accuracy through scaling leads to prohibitive computational costs; (2) inefficient adaptation strategies that require heavy supervision and resources, hindering use in resource-constrained environments; and (3) limited capabilities in handling complex tasks, such as automated hardware code generation and multi-agent collaboration.

This thesis addresses these challenges by leveraging our insight that foundation models encode rich representations, which, if effectively extracted, can enable self-guided optimization. Specifically, we introduce a set of techniques across three complementary levels, each targeting one of the aforementioned challenges: (1) At the attention level, addressing the accuracy-efficiency trade-off, we introduce the Attention Calibration Technique (ACT), which refines suboptimal attention distributions to improve performance without training, and SpotVLM, which reduces visual token redundancy in video-language models through attention-based selection. (2) At the adapter level, targeting adaptation efficiency, we present Master-ASR, which enables dynamic selection and composition of adapters to support efficient model adaptation. (3) At the agent level, targeting complex tasks that require knowledge retrieval and reasoning, we propose Instant-RAG, a retrieval-augmented generation system that hides retrieval overhead within the standard generation workflow to enable efficient knowledge access, and Spec2RTL-Agent, which addresses the challenging task of directly generating Register Transfer Level (RTL) code from specification documents by coordinating multiple foundation models to achieve advanced reasoning capabilities. Together, these techniques form a comprehensive framework for self-guided optimization that addresses key challenges limiting the broader deployment of foundation models, enabling more accessible and capable models in real-world scenarios.

 

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:03/31/2025
  • Modified By:Tatianna Richardson
  • Modified:03/31/2025

Categories

Keywords

Target Audience