PhD Proposla by Yash Goyal

Primary tabs

Title: Towards Transparent and Grounded Systems for Visual Question Answering



Date: Thursday, December 13th, 2018

Time: 11:00am to 12:30pm (ET)

Location: CCB 312A


Yash Goyal

Ph.D. Student in Computer Science 

School of Interactive Computing

Georgia Institute of Technology






Dr. Dhruv Batra (Advisor; School of Interactive Computing, Georgia Institute of Technology)

Dr. Devi Parikh (School of Interactive Computing, Georgia Institute of Technology)

Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)

Dr. Trevor Darrell (University of California, Berkeley)




My research goal is to build transparent and grounded AI systems. Grounding is essential to build reliable and generalizable systems that are not driven by dataset biases. Transparency in AI systems can help system designers find their failure modes and provide guidance to teach humans.


In my thesis, I study these two dimensions -- visual grounding and transparency in the context of Visual Question Answering (VQA), where the task for an AI system is to answer natural language questions about images. Specifically, I will present my work on:

1) tackling the language priors present in the popular VQA datasets for abstract scenes and real images to elevate the role of image understanding in VQA, and

2) studying transparency of VQA systems by:

    a) building an interpretable VQA model,

    b) proposing a new counter-example explanation modality, and

    c) using saliency-based visualization techniques to gain insight into what evidence in the input do uninterpretable VQA models base their decisions on.


In the above works, AI systems are inferior to humans. In these cases, explanations are used to identify their errors and improve them. In my proposed work, I will focus on the setting where AI systems are superior to humans and will study if explanations from these systems can be used to teach humans. More specifically, in the context of fine-grained bird recognition task, I will study if deep models can point humans to look at the right regions of the birds and help them perform better at this hard task.


  • Workflow Status:
  • Created By:
    Tatianna Richardson
  • Created:
  • Modified By:
    Tatianna Richardson
  • Modified:


Target Audience

    No target audience selected.