PhD Defense by Amit Raj

Event Details
  • Date/Time:
    • Monday November 21, 2022
      10:00 am - 12:00 pm
  • Location: Atlanta, GA
  • Phone:
  • URL: Zoom
  • Email:
  • Fee(s):
  • Extras:
No contact information submitted.

Summary Sentence: Leveraging 3D information for controllable and interpretable image synthesis

Full Summary: No summary paragraph submitted.

You are cordially invited to my thesis defense scheduled on the 21st of November. 


Title:  Leveraging 3D information for controllable and interpretable image synthesis


Date: Mon​Nov 21st 2022


Time: 10:00 - 11:30 AM (EST)

Meeting Link: 

Join our Cloud HD Video Meeting

Zoom is the leader in modern enterprise video communications, with an easy, reliable cloud platform for video and audio conferencing, chat, and webinars across mobile, desktop, and room systems. Zoom Rooms is the original software-based conference room solution used around the world in board, conference, huddle, and training rooms, as well as executive offices and classrooms. Founded in 2011, Zoom helps businesses and organizations bring their teams together in a frictionless environment to get more done. Zoom is a publicly traded company headquartered in San Jose, CA.



Amit Raj

Machine Learning PhD Student

School of Electrical and Computer Engineering

Georgia Institute of Technology



  1. James Hays (Advisor) , College of Computing, Georgia Tech
  2. Frank Dellaert, College of Computing, Georgia Tech
  3. Zsolt Kira,  College of Computing, Georgia Tech
  4. Dhruv Batra, College of Computing, Georgia Tech
  5. Jia-Bin Huang, Department of Computer Science, University of Maryland, College Park



Neural image synthesis has seen enormous advances in recent years, led by innovations in GANs which generate high-resolution, photo-realistic images. However, a major limitation of these methods is that they tend to capture texture statistics of an image with no explicit understanding of geometry. Additionally, GAN-only pipelines are notoriously hard to train. In contrast, recent trends in neural and volumetric rendering have demonstrated compelling results by incorporating 3D information into the synthesis pipeline using classical rendering techniques.

We leverage ideas from both classical graphics rendering and neural image synthesis to design 3D guided image generation pipelines that are photo-realistic, controllable, and easy to train. In this thesis, we discuss three sets of models that incorporate geometric information for controllable image synthesis. 

1. Static geometries: We leverage class specific shape priors to present generative models that allow for 3D consistent novel view synthesis. To that end, we propose the first framework that allows for generalization of implicit representations to novel identities in the context of facial avatars.

2. Articulated Geometries: In the second section, we extend controllable synthesis to articulated geometries. We present two frameworks (with explicit and implicit geometric representations) for synthesis of pose and viewpoint controllable full body digital avatars. 

3. Scenes:   In the final section we present a framework for generation of driving scenes with both static and dynamic elements. In particular, the proposed model allows fine grained control over local elements of the scene without needing to resynthesize the entire scene, which we posit should reduce both the memory footprint of the model and inference times. 


Additional Information

In Campus Calendar

Graduate Studies

Invited Audience
Faculty/Staff, Public, Undergraduate students
Phd Defense
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Nov 14, 2022 - 11:49am
  • Last Updated: Nov 14, 2022 - 11:49am