<![CDATA[Ph.D. Dissertation Defense

686091 event 1761768939 1761769021 <![CDATA[Ph.D. Dissertation Defense - Pin-Jui Ku]]> Title: Incorporating Geometric and Consistency Constraints with Deep Models for Robust Phase Reconstruction and Speech Enhancement

Committee:

Dr. Chin-Hui Lee, ECE, Chair, Advisor

Dr. Larry Heck, ECE

Dr. David Anderson, ECE

Dr. Elliot Moore, ECE

Dr. Marco Sabato Siniscalchi, U of Palermo

]]> This dissertation proposes a new framework for phase estimation that overcomes these limitations and demonstrates its effectiveness across multiple DNN-based SE models. We begin by introducing the first deep state-space-based SE model operating on complex-valued spectrograms. While it surpasses baseline models with a compact U-Net architecture, its estimated phase offers limited improvement over the noisy phase, underscoring the difficulty of direct phase prediction. To address this, we develop a novel explicity consistency-preserving loss that leverages the observation that perceptually high-quality speech arises when magnitude and phase are mutually consistent. Building on this insight, we integrate geometric constraints under additive noise conditions with the consistency principle, resulting in the Multi-Sourced Griffin-Lim Algorithm (MSGLA). MSGLA jointly refines speech and noise phases through iterative updates guided by DNN-estimated magnitudes and geometric relationships, outperforming direct phase estimation and prior geometric methods. Finally, we extend these ideas to a large-scale generative pretraining framework that models the distribution of clean speech spectrograms and incorporates the consistency-based phase loss during training.

]]> <![CDATA[]]> 434381 1788 100811 1808