<![CDATA[Phd Proposal by Qi Zhou]]>

631822 event 1580309603 1580912322 <![CDATA[Phd Proposal by Qi Zhou]]> Title: Automated Reasoning for Multi-Query Optimization

Date: Tuesday, February 4, 2020

Time: 01:00 PM - 02:30 PM (EST)

Location: Klaus 1212

Qi Zhou

Ph.D. Student

School of Computer Science

Georgia Institute of Technology

Committee:

Dr. William Harris (advisor) - Galois Inc.

Dr. Joy Arulraj (co-advisor) - School of Computer Science, Georgia Institute of Technology

Dr. Shamkant B.Navathe - School of Computer Science, Georgia Institute of Technology

Dr. Alex Orso - School of Computer Science, Georgia Institute of Technology

Dr. John Regehr - School of Computing, University of Utah

Abstract:

The advent of DataBase-as-a-Service (DBaaS) platforms has increased the importance of multi-query optimization.

These services enable users to quickly create and deploy complex data processing pipelines. However, in practice, these pipelines often exhibit a significant overlap of computation due to the redundant execution of certain SQL queries. We seek to optimize the execution of a collection of queries by identifying and eliminating overlapping computations.

In this proposal, I will present two techniques for efficiently and effectively proving the equivalence of queries. I will first present a symbolic approach to tackle this problem that relies on SMT solver. While this technique covers a wider array of SQL features compared to prior algebraic approaches, it can neither support structurally-different queries nor prove equivalence under bag semantics, the underlying model of all modern database applications. I will next introduce a two-stage verification algorithm with a novel symbolic representation combined with the algebraic approach to circumvent these limitations.

In practice, even queries that are not equivalent tend to have overlapping computation. I propose to design a technique for determining containment relationships between non-equivalent queries. Furthermore, I propose to leverage this technique for augmenting a multi-query optimizer by automatically synthesizing queries that can leverage the results of prior queries.

]]> <![CDATA[]]> 221981 1788 102851