PhD Thesis by Ian Stewart
Title: Modeling variation in linguistic structure with social dynamics in online discussions
Ph.D. Student in Human-Centered Computing
School of Interactive Computing
Georgia Institute of Technology
Date: Friday, November 22, 2019
Time: 4:00 - 6:00 PM (EST)
Location: CODA C1203
Jacob Eisenstein (co-advisor, School of Interactive Computing, Georgia Institute of Technology)
Diyi Yang (co-advisor, School of Interactive Computing, Georgia Institute of Technology)
Munmun De Choudhury (School of Interactive Computing, Georgia Institute of Technology)
Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
David Jurgens (School of Information, University of Michigan)
All languages vary and change naturally, often as a result of variable social forces and internal structural constraints: the "haha" of today may be replaced by tomorrow's "lol." In everyday conversation, people often adapt their language to express their attitudes, to meet the standards of their communities and to accommodate to other people in the conversation. This sociolinguistic variation is pervasive in public discussions on the internet, where people from a variety of backgrounds meet to exchange ideas and to form communities. Previous research in language variation on the internet has often neglected to study examples of linguistic structure, such as word form and word syntax. However, studying variation in linguistic structure can reflect theoretically motivated communication needs that are elided by metrics such as word frequency.
This thesis proposes quantitative methods to explain variation in linguistic structure by analyzing social behavior in online discussions on Twitter, Instagram and Reddit. I focus on language choice, word spelling, syntactic flexibility, descriptive context, and morphology as linguistic structures that can be readily quantified with natural language processing techniques. I show that the variation in such linguistic structures can be explained by social theories including communities of practice, audience design and social attitudes. The thesis provides new methods for investigating sociolinguistic variation, and it provides insight into how people on the internet adapt their language in correlation with social dynamics in online discussions.