event

PhD Defense by Patsorn Sangkloy

Primary tabs

Title: Controllable Content Based Image Retrieval and Synthesis

 

Patsorn Sangkloy

Ph.D. Student in Computer Science

School of Interactive Computing

Georgia Institute of Technology

 

Date: February 4, 2022

Time: 10:00 AM to 12:00 PM (EST)

Location (remote via Bluejeans): https://bluejeans.com/508230685 

 

Committee

Dr. James Hays (Advisor) - School of Interactive Computing, Georgia Institute of Technology

Dr. Devi Parikh - School of Interactive Computing, Georgia Institute of Technology

Dr. Diyi Yang - School of Interactive Computing, Georgia Institute of Technology

Dr. Mark Riedl - School of Interactive Computing, Georgia Institute of Technology

Dr. Subhransu Maji -  College of Information and Computer Sciences, University of Massachusetts, Amherst

 

Abstract

 

A commonly used means to retrieve desired images is by using a text query.  This natural form of querying comes with at least two drawbacks. Firstly, the retrieval system may be language specific, limiting its use to only users speaking supported languages. Secondly, there are certain types of target images that would require a lengthy text query to guarantee a successful retrieval. A representative example is an image containing multiple objects at precise locations. The latter drawback is the primary problem we address in this thesis.

 

In this thesis, we investigate the use of hand-drawn sketches as a form of query to fetch desired images. Two related but subtly different tasks are studied:

 

1. Content Based Image Retrieval (where target images are retrieved from a database),

2. Content Based Image Synthesis (where target images are generated).

 

We consider two modes of querying:

 

1. Visual content (where a query can be expressed as a simple line drawing sketch, an image patch, or a color scribble),

2. Language content (where a query can be expressed as a textual description of desired target images).

 

For sketch-based image retrieval, we propose a cross-domain network that embeds a user query (sketch) and a target image into a shared feature space, facilitating ready similarity scoring. We collected Sketchy Database; a large-scale dataset of matching sketch and image pairs that can be used as training data.  The dataset has been made publicly available and has become one of the few standard benchmarks for sketch-based image retrieval.

 

To incorporate both sketch and language content as queries, we propose a late-fusion dual-encoder approach. Our method generalizes CLIP -- a recent successful work on vision and language representation learning -- to sketch-based input. We also collected 5,000 hand-drawn sketches, which can be combined with existing annotated captions in the COCO data to evaluate image retrieval with both text and sketch queries.

 

For image synthesis, we present a general framework that allows users to interactively control the generated images based on specification of visual features (e.g., shape, color, texture, sketch). For both retrieval and synthesis tasks, our findings reveal that using a sketch as part of the input makes it easier to succinctly describe desired images.

Status

  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:01/31/2022
  • Modified By:Tatianna Richardson
  • Modified:01/31/2022

Categories

Keywords

Target Audience