event

Seminar Talk: Speech Variability and the Possible Invariant Property of Speech Gestures

Primary tabs

Speech acoustic patterns vary significantly as the result of coarticulation and lenition processes that are shaped by segmental context or by performance factors such as production rate and degree of casualness/precision.  Such processes have the most dramatic effect on the acoustic properties of the speech signal that relate to manner and place of articulation, and the resultant acoustic variability continues to offer serious challenges for the development of automatic speech recognition (ASR) systems that can perform well with minimal constraints. For example, conventional ASR systems attempt to account for coarticulatory effects through tri- or quin-phone and cross-word models; however it is inherently difficult to quantify canonically the phonetic spread of coarticulatory effects.  Articulatory Phonology (AP) provides a unified framework for understanding how spatiotemporal changes in the pattern of underlying speech gestures can lead to corresponding changes in the extent of intergestural temporal overlap and in the degree of gestural spatial reduction; in turn, these changes in overlap and reduction create acoustic consequences that are typically reported as assimilations, insertions, deletions and substitutions.  We have made important progress in developing a speech inversion system based on a computational model of AP and have shown that such a system can greatly improve the robustness of ASR systems to noise.  These encouraging results have been obtained even though we have had to make use of synthetically generated speech and articulatory data to develop our speech inversion system, as there were no natural speech databases with the kind of articulatory annotations needed. 

In this talk, I will discuss our work on speech inversion, focusing on the capabilities of the system for modeling changes in temporal overlap and spatial magnitude of gestures.  Currently, we are developing a natural database with annotated gestural information, and we are using these data to improve our speech inversion system.  If our system is able to “uncover” seemingly hidden gestures, then the robustness and accuracy of ASR systems should be vastly improved.   Furthermore, such results will also provide the means for improving a variety of speech applications and leading, for example, to the strengthening of speech pronunciation tools in the classroom and clinic, and to the development of more natural sounding synthetic speech that will better reflect idiosyncratic individual differences between speakers.

Speaker Bio:
Carol Espy-Wilson is a professor in the Electrical and Computer Engineering Department and the Institute for Systems Research at the University of Maryland.  Dr. Espy-Wilson received a B.S. in Electrical Engineering from Stanford University in 1979, and a M.S., E.E. and Ph.D. in Electrical Engineering from the Massachusetts Institute of Technology in 1981, 1984 and 1987, respectively. Dr. Espy-Wilson directs the Speech Communication Lab which combines knowledge of digital signal processing, speech science, linguistics acoustic phonetics and machine learning to conduct interdisciplinary research in many speech areas including speech recognition, speech production, speaker recognition, speech enhancement, emotion recognition and more recently speech biomarkers for depression.  Dr. Espy-Wilson is the recipient of the NSF Minority Initiation Award (1990-1992), the Clare Booth Luce Professorship (1990-1995), an NIH Career Award (1998-2003), the Honda Initiation Award (2004-2005), and a Radcliffe Fellowship (2008-2009). She is a Fellow of the Acoustical Society of America (ASA) and a Senior Member of IEEE.  She served as Chair of the Speech Technical Committee of the ASA (2007-2010), as an associate editor of the ASA's magazine, Acoustics Today (2007-2009), as an appointed member of the Language and Communication Study Section at NIH (2001-2004), an elected member of the Speech and Language Technical Committee of IEEE (2010-2013), and as a member of the National Institutes of Health (NIH) National Advisory Board for Medical Rehabilitation (2011-2014).  Currently, she is an Associate Editor of the Journal of the Acoustical Society of America and a member of the National Advisory Council for the National Institute of Biomedical Imaging and Bioengineering at NIH.


 

Status

  • Workflow Status:Published
  • Created By:Ashlee Gardner
  • Created:06/12/2014
  • Modified By:Fletcher Moore
  • Modified:10/07/2016

Keywords

  • No keywords were submitted.

Target Audience