{"403981":{"#nid":"403981","#data":{"type":"event","title":"PhD Defense Announcement by Seungyeon Kim","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003EPh.D. Defense of Dissertation Announcement\u003C\/strong\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E \u003Cstrong\u003ETitle: Novel Document Representations based on Labels and Sequential Information\u003C\/strong\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E \u003Cstrong\u003ESeungyeon Kim\u003C\/strong\u003E\u003Cbr \/\u003E School of Computational Science and Engineering\u003Cbr \/\u003E (Ph.D. Computer Science program)\u003Cbr \/\u003E College of Computing\u003Cbr \/\u003E Georgia Institute of Technology\u003Cbr \/\u003E \u003Ca href=\u0022http:\/\/sylund.net\/\u0022 target=\u0022_blank\u0022\u003Ehttp:\/\/sylund.net\u003C\/a\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E Date: Thursday, May 28, 2015\u003Cbr \/\u003E Time: 11:30am - 1:30pm ET (8:30am - 10:30 am PT)\u003Cbr \/\u003E Location: Klaus Conference Room 1202\u003Cbr \/\u003E \u003Cbr \/\u003E \u003Cstrong\u003ECommittee:\u003C\/strong\u003E\u003Cbr \/\u003E Prof. Guy Lebanon (Advisor, School of Computational Science and Engineering, Georgia Institute of Technology)\u003Cbr \/\u003E Prof. Haesun Park (Co-advisor, School of Computational Science and Engineering, Georgia Institute of Technology)\u003Cbr \/\u003E Dr. Irfan Essa (School of Interactive Computing, Georgia Institute of Technology)\u003Cbr \/\u003E Dr. Jacob Eisenstein (School of Interactive Computing, Georgia Institute of Technology)\u003Cbr \/\u003E Dr. Samy Bengio (Google Inc)\u003Cbr \/\u003E \u003Cbr \/\u003E\u003Cstrong\u003E Abstract:\u003C\/strong\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E \u003Cem\u003EWide variety of text analysis applications are based on statistical machine learning techniques. One of fundamental questions that have to be answered for the techniques is how we represent documents. A representation or often called a feature vector of a document plays a significant role in overall performance of the techniques.\u003C\/em\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E\u003Cem\u003E Then, we can start asking what makes a good representation. There are number of aspects of a good representation, but we will focus on the following four aspects. First and obviously, a representation should reflect the original data accurately. Reconstruction quality is the most fundamental evaluation metric of a representation. Second, since we are usually interested in discriminating documents from each other, a representation should be distinguishable. Third, if a representation itself is easy to interpret by a human, it will be very convenient. Fourth, a good representation should have an efficient algorithm to be computed. Without scalability, a representation will just remain in theoretical research.\u003C\/em\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E\u003Cem\u003E Obtaining such a good document representation has several challenges. The most significant challenge comes from the sparsity of documents, which is extremely common in textual data. The sparsity often cause high estimation error. The second hardship comes from text\u0027s sequential nature, interdependencies between words. Although ordering of words largely affect their semantics, modeling those is not easy because of various reasons. For example, n-gram model attempts to capture partial sequences of multiple words, but it suffers from sparser observations on the other hand.\u003C\/em\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E\u003Cem\u003E This thesis presents novel document representations to overcome the two challenges, sparsity and sequentiality. We employ label and sequential information of documents during our representation learning. Utilizing label characteristics enables us to find a dense subspace of interest that overcomes the sparsity issue. On the other hand, we present document representations that reflects sequential dependencies without suffering high estimation error. Lastly, the thesis is concluded with a document representation that employing both label and sequential information.\u003C\/em\u003E\u003Cbr \/\u003E \u003Cbr \/\u003E\u003Cem\u003E Approaches in this dissertation will be helpful for understanding documents in large scale. Most methods focus on efficient computation based on approximation or relaxations.\u003C\/em\u003E\u003Cbr \/\u003E \u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Novel Document Representations based on Labels and Sequential Information"}],"uid":"27707","created_gmt":"2015-05-12 11:27:30","changed_gmt":"2016-10-08 02:11:57","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2015-05-28T12:30:00-04:00","event_time_end":"2015-05-28T14:30:00-04:00","event_time_end_last":"2015-05-28T14:30:00-04:00","gmt_time_start":"2015-05-28 16:30:00","gmt_time_end":"2015-05-28 18:30:00","gmt_time_end_last":"2015-05-28 18:30:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"1366","name":"defense"},{"id":"1808","name":"graduate students"},{"id":"913","name":"PhD"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}