Ph.D Proposal Defense by Yi Yang

Ph.D. Thesis Proposal Announcement

Title: Robust Adaptation of Natural Language Processing for Language Variation

Yi Yang
Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology
http://www.cc.gatech.edu/~yyang319/

Date: Tuesday, April 7, 2015
Time: 3:00pm – 5:00pm EDT
Location: Klaus 1212

Committee
Dr. Jacob Eisenstein (Advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. James M. Rehg, School of Interactive Computing, Georgia Institute of Technology
Dr. Duen Horng (Polo) Chau, School of Computational Science & Engineering, Georgia Institute of Technology
Dr. Byron Boots, School of Interactive Computing, Georgia Institute of Technology

Abstract:
Natural Language Processing (NLP) technology has been applied in various domains, ranging from social media and digital humanities to public health. Unfortunately, the adoption of existing NLP techniques in these areas often experiences unsatisfactory performances, as existing NLP techniques are driven by standard corpora, which is vulnerable to variation in languages of new datasets and settings. Previous approaches toward this problem suffer from two major weaknesses. First, they usually employ supervised methods that require expensive annotations and easily become outdated with respect to the dynamic nature of languages. Second, they often fail to leverage the valuable metadata associated with the target languages of these areas.

In this thesis, I propose to overcome these weaknesses by exploring unsupervised learning techniques to build NLP systems that are robust to language variation, primarily branching into: a) unsupervised text normalization, transforming lexical variations into text that better matches standard datasets; b) unsupervised domain adaptation, adapting standard NLP tools to fit the text with variation directly, through learning of representations that are robust to variation; c) personalized natural language processing, incorporating user metadata to adapt generic NLP to each individual user. These approaches are driven by co-occurrence statistics as well as rich metadata without the need of costly annotations, and can easily adapt to new settings. My preliminary work on text normalization and domain adaptation delivers state-of-the-art NLP systems for social media and historical text. As a future work, I propose to further boost the results by leveraging various user metadata.

Media

No media selected

Summary

Details

Tuesday

Apr 7 2015

04:00pm - 06:00pm

In campus calendar: No

Sidebar Content

No sidebar content

Groups

Graduate Studies

Status

Workflow Status:Published
Created By:Tatianna Richardson
Created:04/01/2015
Modified By:Fletcher Moore
Modified:10/07/2016

Mercury (Hg)

Ph.D Proposal Defense by Yi Yang

Log in

Georgia Institute of Technology

Ph.D Proposal Defense by Yi Yang

Primary tabs

Log in

Georgia Institute of Technology