event

PhD Defense by Fatih ILHAN

Primary tabs

Dear faculty members and fellow students,

 

You are cordially invited to my PhD thesis defense on Thursday, Nov 13th. 

 

Title: Resource-Adaptive Efficiency Optimizations for Large Vision and Language Models

 

Date: Thursday, Nov 13th

 

Time: 12:30 PM - 2:30 PM

 

Location: Klaus Advanced Computing Building (KACB), Room 3402

Zoom Link: https://gatech.zoom.us/my/prof.lingliu.personal.zoom?omn=95005081753

 

Fatih ILHAN

Computer Science PhD Student

School of Computer Science
Georgia Institute of Technology

 

Committee:

 

  1. Dr. Ling Liu (Advisor), School of Computer Science, Georgia Tech
  2. Dr. Greg Eisenhauer, School of Computer Science, Georgia Tech
  3. Dr. Yingyan Celine Lin, School of Computer Science, Georgia Tech
  4. Dr. Calton Pu, School of Computer Science, Georgia Tech
  5. Dr. Kishore Ramachandran, School of Computer Science, Georgia Tech
  6. Dr. Lakshmish Ramaswamy, School of Computing, University of Georgia
  7. Dr. Gong Su, IBM T.J. Watson Research Center, IBM Research

 

Abstract:

 

The deployment of large-scale vision and language models requires careful management of resources considering their increasingly significant computational demands. Enabling resource-aware fine-tuning and inference of large pre-trained models under limited memory, compute, power, and network capacity is essential, particularly in distributed environments with heterogeneous edge clients. This dissertation research is dedicated to investigating resource-adaptive frameworks and optimization methodologies for enabling efficient fine-tuning and inference while performing reliably across diverse contexts. 

 

On inference efficiency, this dissertation research makes original contributions along two trajectories. First, this dissertation research introduces EENet, an early-exit scheduling framework for fast inference on the edge. EENet enhances multi-exit neural networks with an early exit inference scheduler, which speeds up the inference by enabling early exits while maintaining the inference quality. This scheduler optimizes an early exiting policy based on available latency budget and user queries. The EENet framework is further expanded to distributed/hierarchical computing environments by model partitioning along early exits. Second, this dissertation develops alternative inference efficiency optimization methods to improve the test-time efficiency for large vision and language models in long-context tasks. The main novelty of our methods is to develop layer-wise attention-aware compaction and de-compaction optimizations to reduce the size of intermediate representations while maintaining high inference performance.

 

On fine-tuning efficiency, this dissertation research makes three unique contributions. First, the dissertation research introduces ScaleFL, a resource-adaptive federated learning framework through two-dimensional model partitioning with self-distillation to address resource heterogeneity of edge clients. Second, this dissertation research introduces FedHFT, an efficient and personalized federated fine-tuning framework, which utilizes mixture of masked adapters over client clusters, enabling fine-tuning over privacy-sensitive or mission-critical non-iid data locally hosted in distributed and resource-constrained clients. The third contribution on fine-tuning optimization is the development of RECAP, a memory-efficient pruning framework for fine-tuning large vision and large language transformers. RECAP iteratively cycles between pruning, finetuning, and updating stages to explore different sub-networks while simultaneously employing CPU and GPU to maximize resource utilization and minimize GPU memory footprint.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:10/22/2025
  • Modified By:Tatianna Richardson
  • Modified:10/22/2025

Categories

Keywords

Target Audience