PhD Dissertation Defense by Aishwarya Agrawal

Event Details
  • Date/Time:
    • Tuesday August 20, 2019
      1:30 pm - 3:30 pm
  • Location: Coda C1115 Druid Hills
  • Phone:
  • URL:
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Visual Question Answering and Beyond

Full Summary: No summary paragraph submitted.

PhD Dissertation Defense

Title: Visual Question Answering and Beyond

by Aishwarya Agrawal

Date: Tuesday, August 20 2019

Time: 1:30PM - 3:30PM (ET)

Location: Coda C1115 Druid Hills

 

Committee:

Dr. Dhruv Batra (Advisor, School of Interactive Computing, Georgia Institute of Technology)

Dr. Devi Parikh (School of Interactive Computing, Georgia Institute of Technology)

Dr. James Hays (School of Interactive Computing, Georgia Institute of Technology)

Dr. C. Lawrence Zitnick (Research Lead, Facebook AI Research, Menlo Park)

Dr. Oriol Vinyals (Research Scientist, Google DeepMind, London)

 

Abstract:

In this thesis, I propose and study a multi-modal AI task called Visual Question Answering (VQA) -- given an image and a natural language question about the image (e.g., "What kind of store is this?", "Is it safe to cross the street?"), the machine's task is to automatically produce an accurate natural language answer ("bakey", "yes"). Applications of VQA include -- aiding visually impaired users in understanding their surroundings, aiding analysts in examining large quantities of surveillance data, teaching children through interactive demos, interacting with personal AI assistants, and making visual social media content more accessible.

Specifically, I study the following -- 

1) how to create a large-scale dataset and define evaluation metrics for free-form and open-ended VQA, 

2) how to develop techniques for characterizing the behavior of VQA models, and

3) how to build VQA models that are less driven by language biases in training data and are more visually grounded, by proposing -- 

        a) a new evaluation protocol,  

        b) a new model architecture, and 

        c) a novel objective function.

Most of my past work has been towards building agents that can ‘see’ and ‘talk’. However, for a lot of practical applications (e.g., physical agents navigating inside our houses executing natural language commands) we need agents that can not only ‘see’ and ‘talk’ but can also take actions. Towards the end, I will present future directions towards generalizing vision and language agents to be able to take actions.

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Public
Categories
Other/Miscellaneous
Keywords
Phd Defense
Status
  • Created By: Jacquelyn Strickland
  • Workflow Status: Published
  • Created On: Aug 11, 2019 - 8:17pm
  • Last Updated: Aug 12, 2019 - 11:39am