{"616081":{"#nid":"616081","#data":{"type":"event","title":"PhD Defense by Ashley Edwards","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle:\u003C\/strong\u003E Emulation and Imitation via Perceptual Goal\u003Cbr \/\u003E\r\nSpecifications\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAshley D. Edwards\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPh.D. Student\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESchool of Interactive Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECollege of Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDate: Monday, January 14th, 2019\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETime: 12:30 PM to 2:30PM (EST)\u003C\/p\u003E\r\n\r\n\u003Cp\u003ELocation: TBA, College of Computing Building\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ECommittee:\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E---------------\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Charles Isbell (Advisor), School of\u003Cbr \/\u003E\r\nInteractive Computing, Georgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Tucker Balch, School of Interactive\u003Cbr \/\u003E\r\nComputing, Georgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Sonia Chernova, School of Interactive\u003Cbr \/\u003E\r\nComputing, Georgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Mark Riedl, School of Interactive\u003Cbr \/\u003E\r\nComputing, Georgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Pieter Abbeel, Department of Electrical\u003Cbr \/\u003E\r\nEngineering and Computer Sciences, University of California, Berkeley\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESummary:\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003E---------------\u003C\/p\u003E\r\n\r\n\u003Cp\u003EMuch of the power behind reinforcement\u003Cbr \/\u003E\r\nlearning is that we can use a single signal, known as the reward, to indicate\u003Cbr \/\u003E\r\ndesired behavior. However, defining these rewards can often be difficult. This\u003Cbr \/\u003E\r\ndissertation introduces an alternative to the typical reward design mechanism.\u003Cbr \/\u003E\r\nIn particular, we introduce four methods that allow one to focus on specifying\u003Cbr \/\u003E\r\nperceptual goals, rather than scalar rewards. By removing domain-specific\u003Cbr \/\u003E\r\naspects of the problem, we demonstrate that goals can be expressed while being\u003Cbr \/\u003E\r\nagnostic to the reward function, action-space, or state-space of the agent\u0026rsquo;s\u003Cbr \/\u003E\r\nenvironment.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFirst, we will introduce perceptual reward\u003Cbr \/\u003E\r\nfunctions and describe how we can utilize a hand-defined similarity metric to\u003Cbr \/\u003E\r\nenable learning from goals that are different from the agent\u0026rsquo;s. We show how we\u003Cbr \/\u003E\r\ncan use this method to train a simulated robot to learn from videos of humans.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ENext, we will introduce cross-domain\u003Cbr \/\u003E\r\nperceptual reward functions and describe how we can learn a reward function for\u003Cbr \/\u003E\r\ncross-domain goal specifications. We show how we can use this method to train\u003Cbr \/\u003E\r\nan agent in a maze to reach goals specified through speech and hand gestures.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ENext, we will introduce perceptual value\u003Cbr \/\u003E\r\nfunctions and describe how we can learn a value function from sequences of\u003Cbr \/\u003E\r\nexpert observations without access to ground-truth actions. We show how we can\u003Cbr \/\u003E\r\nuse this method to infer values from observation for a maze and pouring task,\u003Cbr \/\u003E\r\nand to train an agent to solve unseen levels within a platform game.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFinally, we will introduce latent policy\u003Cbr \/\u003E\r\nnetworks and describe how we can learn a policy from sequences of expert\u003Cbr \/\u003E\r\nobservations without access to ground-truth actions. We show how we can use\u003Cbr \/\u003E\r\nthis method to infer a policy from observation and train an agent to solve\u003Cbr \/\u003E\r\nclassic control tasks and a platform game.\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Emulation and Imitation via Perceptual Goal Specifications"}],"uid":"27707","created_gmt":"2019-01-08 20:37:49","changed_gmt":"2019-01-08 20:37:49","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2019-01-14T12:30:00-05:00","event_time_end":"2019-01-14T14:30:00-05:00","event_time_end_last":"2019-01-14T14:30:00-05:00","gmt_time_start":"2019-01-14 17:30:00","gmt_time_end":"2019-01-14 19:30:00","gmt_time_end_last":"2019-01-14 19:30:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"174045","name":"Graduate students"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}