{"614662":{"#nid":"614662","#data":{"type":"event","title":"PhD Proposal by Jiasen Lu","body":[{"value":"\u003Cp\u003ETitle: Grounded Vision and Language Understanding\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDate: Thursday, November 29 2018\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETime: 1:00PM - 2:15PM (ET)\u003C\/p\u003E\r\n\r\n\u003Cp\u003ELocation: TSRB 223\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EJiasen Lu\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPh.D. Student in Computer Science\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESchool of Interactive Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022https:\/\/www.cc.gatech.edu\/~jlu347\/\u0022\u003Ehttps:\/\/www.cc.gatech.edu\/~jlu347\/\u003C\/a\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECommittee:\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Devi Parikh (Advisor, School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Dhruv Batra (School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Jason J. Corso (Electrical Engineering\u0026nbsp; and Computer Science\u0026nbsp; Dept., University of Michigan)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Richard Socher (Salesforce Research)\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAbstract:\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe world around us involves multiple modalities. One of the major challenges in modeling different modalities jointly is how to induce appropriate grounding in models given the heterogeneity of the data. Which parts of the image and question should the model focus on when answering a question about an image? When should it rely on visual data vs. just the language model when describing an image? How can we integrate object detectors to produce fluent but visually grounded image captions? How can we disentangle \u0026quot;what to say\u0026quot; from \u0026quot;how to say it\u0026quot; when automatically generating questions about images?\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIn this thesis, I take steps towards studying how inducing appropriate grounding in deep models improves multi-modal AI capabilities, in the context of vision and language understanding.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESpecifically, I will present --\u003C\/p\u003E\r\n\r\n\u003Cp\u003E1) how to ground visual question answering models in appropriate regions of the image and appropriate phrases in the question to more accurately answer questions about images\u003C\/p\u003E\r\n\r\n\u003Cp\u003E2) how to provide skip connections to an image captioning model so that it can rely on just the language model for some words in the caption that are not visual\u003C\/p\u003E\r\n\r\n\u003Cp\u003E3) how to ground image captioning models in object detections by combining symbolic and deep learning approaches to avoid hallucinations of visual concepts in image captions\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIn proposed work, I will study how to disentangle \u0026quot;what to ask\u0026quot; and \u0026quot;how to ask it\u0026quot; when generating a question -- that is, grounding question generation in the \u0026quot;intention\u0026quot; of the question -- in the context of a multi-agent image guessing game.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Grounded Vision and Language Understanding"}],"uid":"27707","created_gmt":"2018-11-27 18:12:02","changed_gmt":"2018-11-27 18:12:02","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2018-11-29T13:00:00-05:00","event_time_end":"2018-11-29T15:00:00-05:00","event_time_end_last":"2018-11-29T15:00:00-05:00","gmt_time_start":"2018-11-29 18:00:00","gmt_time_end":"2018-11-29 20:00:00","gmt_time_end_last":"2018-11-29 20:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"102851","name":"Phd proposal"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"174045","name":"Graduate students"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}