event
PhD Defense by Hairong Wang
Primary tabs
Title: Knowledge-Informed Weakly-Supervised Deep Learning Models for Cancer Applications
Date: June 6, 2025 (Friday)
Time: 1:00 pm – 3:00 pm EST
Location: Groseclose 403
Zoom link: https://gatech.zoom.us/j/94243279953
Hairong Wang
Operations Research PhD Student
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
Committee
Dr. Jing Li (Advisor), H. Milton Stewart School of Industrial and Systems Engineering
Dr. Jianjun Shi, H. Milton Stewart School of Industrial and Systems Engineering
Dr. Xiaochen Xian, H. Milton Stewart School of Industrial and Systems Engineering
Dr. Kristin Swanson, Mayo Clinic
Dr. Shuai Huang, Department of Industrial and Systems Engineering at the University of Washington
Abstract
In recent decades, deep learning (DL) has emerged as a promising tool for analyzing complex patterns from large datasets. The computational power and versatility of DL has enabled in-depth analysis of medical imaging, clinical, and molecular data, significantly enhancing diagnosis, prognosis, and treatment planning in healthcare. However, an intrinsic bottleneck exists in healthcare data acquisition, limited by the invasiveness or high expense of sample collection, the need for highly-specialized experts to create accurate labels, the rarity of some diseases in the population, and the difficulty in patient recruitment. One approach to addressing these challenges involves enhancing sample efficiency and integrating biomedical knowledge into data-driven models. This has shown considerable potential in boosting the accuracy, robustness, and interpretability of model outcomes, representing a significant advancement in applying DL within cancer applications. To tackle with sample efficiency and biomedical knowledge integration, this thesis focuses on developing knowledge-informed image-based DL methodologies that are more efficient and generalizable, and seeking practical solutions for real-world cancer applications.
Chapter 2 proposes BioNet, a biologically informed neural network designed to predict the regional distributions of two primary tissue-specific gene modules in recurrent glioblastoma using medical imaging data. BioNet introduces a novel integration of multiple forms of implicit, qualitative biological domain knowledge into the deep learning pipeline to enable pixel-level prediction of gene module distributions. To overcome the challenge posed by the limited availability of image-localized biopsy datasets, BioNet generates virtual biopsy data for pretraining, which supports the learning of more generalizable features. Inspired by the biological relationships between gene modules, BioNet employs a hierarchical architecture and incorporates a knowledge attention loss that combines data-driven and knowledge-driven components to penalize biologically inconsistent predictions.
Chapter 3 presents KIMAS, a knowledge-inspired self-supervised masked autoencoder for sparse supervision, designed to address the critical challenge of accurate spatial dense prediction of genetic alterations from non-invasive imaging in glioblastoma. This challenge arises primarily from the extreme sparsity of biopsy-labeled data and the high inter-patient heterogeneity. KIMAS integrates a knowledge-inspired self-supervised pretraining task that reconstructs a synthetic healthy version of the tumoral region of interest, enabling the encoder to learn patient-specific features while excluding pathological variation. It further incorporates a contextualized representation learning approach based on Vision Transformers (ViTs), which captures local features with global context, and a knowledge-informed fine-tuning strategy that combines auxiliary dense-label pretraining with regularization based on organ structural symmetry and prediction map smoothness.
Chapter 4 introduces SmoothSegNet, a novel deep learning framework designed to improve imaging based liver tumor segmentation by addressing key challenges including tumor heterogeneity, vague boundaries, and limited labeled data. The model incorporates a knowledge-informed label smoothing technique that distills clinical insights to generate soft supervision signals, reducing overfitting and enhancing generalization. It employs a global-local segmentation architecture that decomposes the task into two sub-tasks, liver and tumor segmentation, allowing targeted optimization and training for each sub-task. Additionally, customized pre- and post-processing pipelines are integrated to enhance tumor visibility and refine segmentation boundaries, further improving model precision in complex clinical scenarios.
Groups
Status
- Workflow Status:Published
- Created By:Tatianna Richardson
- Created:05/27/2025
- Modified By:Tatianna Richardson
- Modified:05/27/2025
Categories
Keywords
Target Audience