{"614547":{"#nid":"614547","#data":{"type":"event","title":"PhD Proposal by Aishwarya Agrawal","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle:\u003C\/strong\u003E Visual Question Answering and Beyond\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDate: Wednesday, November 28 2018\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETime: 1:00PM - 2:30PM (ET)\u003C\/p\u003E\r\n\r\n\u003Cp\u003ELocation: CCB 345\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAishwarya Agrawal\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPh.D. Student\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESchool of Interactive Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECollege of Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ECommittee:\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Dhruv Batra (Advisor, School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Devi Parikh (School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. James Hays (School of Interactive Computing, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. C. Lawrence Zitnick (Research Lead, Facebook AI Research, Menlo Park)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Oriol Vinyals (Research Scientist, Google DeepMind, London)\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EAbstract:\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIn this thesis, I will present my work on a multi-modal AI task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., \u0026quot;What kind of store is this?\u0026quot;, \u0026quot;Is it safe to cross the street?\u0026quot;), the machine\u0026#39;s task is to automatically produce an accurate natural language answer (\u0026quot;bakey\u0026quot;, \u0026quot;yes\u0026quot;). Applications of VQA include -- aiding visually impaired users in understanding their surroundings, aiding analysts in examining large quantities of surveillance data, teaching children through interactive demos, interacting with personal AI assistants, and making visual social media content more accessible.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESpecifically, I will present the following --\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E1) how to create a large-scale dataset and define evaluation metrics for free-form and open-ended VQA,\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E2) how to develop techniques for characterizing the behavior of VQA models, and\u003C\/p\u003E\r\n\r\n\u003Cp\u003E3) how to build VQA models that are less driven by language biases in training data and are more visually grounded, by proposing --\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; a) a new evaluation protocol,\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; b) a new model architecture, and\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; c) a novel objective function.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIn proposed work, I will study how to build agents that can not only \u0026quot;see\u0026quot; and \u0026quot;talk\u0026quot;, but can also \u0026quot;act\u0026quot;. Specifically, I will study how can we train agents to -- follow language instructions grounded in visual data (e.g., \u0026#39;Add a red sphere\u0026#39;, \u0026#39;Add a large cylinder\u0026#39; ) and execute actions to generate scenes that are consistent with the given instruction.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Visual Question Answering and Beyond"}],"uid":"27707","created_gmt":"2018-11-26 15:40:49","changed_gmt":"2018-11-29 14:58:19","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2018-11-28T13:00:00-05:00","event_time_end":"2018-11-28T15:00:00-05:00","event_time_end_last":"2018-11-28T15:00:00-05:00","gmt_time_start":"2018-11-28 18:00:00","gmt_time_end":"2018-11-28 20:00:00","gmt_time_end_last":"2018-11-28 20:00:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"102851","name":"Phd proposal"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"174045","name":"Graduate students"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}