event

PhD Proposal by Chao Jiang

Primary tabs

Title: Studying Text Revision in Scientific Writing

Date/Time: Jan. 4th, 2024, 1:30 PM - 3:30 PM ET (10:30 AM - 12:30 PM PST)

Location: Zoom Link

 

 

Chao Jiang

Ph.D. Student in Computer Science

School of Interactive Computing

Georgia Institute of Technology

 

Committee:

Dr. Wei Xu (advisor), School of Interactive Computing, Georgia Tech

Dr. Alan Ritter, School of Interactive Computing, Georgia Tech

Dr. Kartik Goyal, School of Interactive Computing, Georgia Tech

Dr. Nanyun Peng, Computer Science Department, University of California, Los Angeles

 

 

Abstract:

Scientific publications are the primary channel for sharing research findings. Researchers devote a huge amount of effort to improving the writing quality, and valuable knowledge is encoded in the revision process. Up to December 28th, 2023, arXiv (https://arxiv.org/), an open-access e-print service,  has archived over 2.3 million papers, among which more than 600k papers have multiple versions available. This provides an amazing data source for studying text revision in scientific writing. Specifically, revisions between different versions of papers contain valuable information about logical and structural improvements at document-level, as well as stylistic and grammatical refinements at sentence- and word-levels. However, it also poses a unique challenge: the vast amount of data demands efficient and effective techniques to extract and analyze text revisions. This thesis focuses on (1) developing sentence and word alignment methods to extract revision at different granularity; (2) constructing a new dataset to analyze fine-grained edits and their underlying intention; and (3) analyzing the human revision in medical literature from a readability perspective, which is crucial for disseminating scientific knowledge to a broader audience.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:01/04/2024
  • Modified By:Tatianna Richardson
  • Modified:01/04/2024

Categories

Keywords

Target Audience