event

Ph.D Proposal Defense by Yi Yang

Primary tabs

Ph.D. Thesis Proposal Announcement

Title: Robust Adaptation of Natural Language Processing for Language Variation

Yi Yang
Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology
http://www.cc.gatech.edu/~yyang319/

Date: Tuesday, April 7, 2015
Time: 3:00pm – 5:00pm EDT
Location: Klaus 1212

Committee
Dr. Jacob Eisenstein (Advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. James M. Rehg, School of Interactive Computing, Georgia Institute of Technology
Dr. Duen Horng (Polo) Chau, School of Computational Science & Engineering, Georgia Institute of Technology
Dr. Byron Boots, School of Interactive Computing, Georgia Institute of Technology


Abstract:
Natural Language Processing (NLP) technology has been applied in various domains, ranging from social media and digital humanities to public health. Unfortunately, the adoption of existing NLP techniques in these areas often experiences unsatisfactory performances, as existing NLP techniques are driven by standard corpora, which is vulnerable to variation in languages of new datasets and settings. Previous approaches toward this problem suffer from two major weaknesses. First, they usually employ supervised methods that require expensive annotations and easily become outdated with respect to the dynamic nature of languages. Second, they often fail to leverage the valuable metadata associated with the target languages of these areas.

In this thesis, I propose to overcome these weaknesses by exploring unsupervised learning techniques to build NLP systems that are robust to language variation, primarily branching into: a) unsupervised text normalization, transforming lexical variations into text that better matches standard datasets; b) unsupervised domain adaptation, adapting standard NLP tools to fit the text with variation directly, through learning of representations that are robust to variation; c) personalized natural language processing, incorporating user metadata to adapt generic NLP to each individual user. These approaches are driven by co-occurrence statistics as well as rich metadata without the need of costly annotations, and can easily adapt to new settings. My preliminary work on text normalization and domain adaptation delivers state-of-the-art NLP systems for social media and historical text. As a future work, I propose to further boost the results by leveraging various user metadata.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:04/01/2015
  • Modified By:Fletcher Moore
  • Modified:10/07/2016

Categories

Target Audience