Phd Proposal by Wenhao Yu

Primary tabs

Title: Teaching robots to walk using deep reinforcement learning and sim-to-real transfer   Wenhao Yu School of Interactive Computing College of Computing Georgia Institute of Technology   Date: Tuesday, October 22nd, 2019 Time: 12:00pm EDT Location: GVU Center, TSRB   Committee: --------------- Dr. Greg Turk (Advisor, School of Interactive Computing, Georgia Tech) Dr. C. Karen Liu (Advisor, School of Engineering, Stanford University / School of Interactive Computing, Georgia Tech) Dr. Charlie Kemp (Department  of Biomedical Engineering / School of Interactive Computing, Georgia Tech) Dr. Sergey Levine (Department  of  Electrical  Engineering and Computer Sciences, University of California, Berkeley) Dr. Michiel van de Panne (Department of Computer Science, University of British Columnbia)     Abstract: ------------ Deep reinforcement learning (DRL) has the potential to automate the process of developing controllers for complex motor skills such as locomotion. However, due to the high sample complexity and safety concerns, directly applying DRL on real robot has been in general infeasible. Computer simulation provides a safe and efficient way to train robotic controllers, but a control policy trained in computer simulation usually fails to perform the desired tasks on real hardware due to the discrepancy in simulation, also known as the Reality Gap. In this proposal, we investigate the problem of transferring a simulation trained policy to a real robot, with a focus on learning locomotion skills for biped and quadruped robots. Legged locomotion requires precise coordination of different motors on the robot to keep it walking forward while keeping balance, which makes sim-to-real transfer for legged locomotion a challenging task.   We first introduce an algorithm named Universal Policy with Online System Identification (UP-OSI), where a model is trained to identify physics parameters from the robot's observations and to guide the control policy to choose suitable actions. We demonstrate that UP-OSI can adapt to changes in dynamic parameters such as friction coefficients or body mass. However, the success of UP-OSI relies largely on the ability of OSI to obtain good estimations of the physical parameters. When the training and testing dynamics are significantly different, the performance of OSI will drop, leading to worse performance.    To overcome larger discrepancies in dynamics, we introduce a series of algorithms based on the idea of Strategy Optimization (SO), where the policy is allowed to collect additional data in the target environment and use those experiences to find the best input to the policy explicitly. This allows the policy to overcome larger reality gap and has been successfully applied to learn locomotion controllers for a biped robot, Robotis Darwin OP2, and a quadruped robot, Ghost Robotics Minitaur.   Finally, we discuss possible paths toward obtaining a more reliable and versatile locomotion policy that can control the legged robot to walk in more challenging environments such as on a road with tiles of different friction coefficients, on a path with varying slopes, or on a deformable surface like sofa. We plan to use the Darwin OP2 robot as the main testing platform. Our first step is to identify a more accurate actuator model for the robot. Then based on the result of the first step, we will choose one from three possible directions to explore, including: 1) extending UP-OSI to train locomotion controllers for unstructured environments, 2) extending SO-based methods to time-varying environments, or 3) fast fine-tuning of control policy on the hardware.


  • Workflow Status: Published
  • Created By: Tatianna Richardson
  • Created: 10/16/2019
  • Modified By: Tatianna Richardson
  • Modified: 10/16/2019