PhD Defense by Joshua Kimball

Primary tabs

Title: PerfDB + PerfML: Enabling Big Data-Driven Research on Fine-Grained Performance Phenomena


Joshua Kimball

Ph.D. Candidate

School of Computer Science

College of Computing

Georgia Institute of Technology


Date: May 5, 2021

Time: 1:00 PM to 3:00PM EDT

Location: Online (Bluejeans)



Meeting URL



Meeting ID

857 372 648


Want to dial in from a phone?


Dial one of the following numbers:

+1.408.419.1715 (United States(San Jose))

+1.408.915.6290 (United States(San Jose))

(see all numbers - https://www.bluejeans.com/numbers)


Enter the meeting ID and passcode followed by #


Connecting from a room system?

Dial: bjn.vc or and enter your meeting ID & passcode





Dr. Calton Pu (Advisor) - School of Computer Science, Georgia Institute of Technology

Dr. Arulraj Joy - School of Computer Science, Georgia Institute of Technology

Dr. Ling Liu - School of Computer Science, Georgia Institute of Technology

Dr. Sham Navathe - School of Computer Science, Georgia Institute of Technology

Dr. Qingyang Wang - School of Computer Science, Louisiana State University



The long-tail latency problem is a well-known problem in large-scale system topologies like cloud platforms. Long-tail latency can lead to less predictable system performance, degraded quality of experience and potential economic loss. Previous research has focused on coarse-grained, symptomatic treatments like redundant request executions to mitigate tail latency and its effects. Instead, we propose studying these performance bugs systematically and addressing their underlying root cause.

The millibottleneck theory of performance bugs provides a testable hypothesis for explaining at least some requests comprising the latency long tail. The theory posits that transient performance anomalies cause a non-negligible number of requests to complete in seconds, called Very Long Response Time Requests (VLRT), instead of tens of milliseconds like the vast majority of other requests.

In this dissertation, we enable the systematic evaluation of the millibottleneck theory across a big data-scale experimental data collection. First, we present perftables, a performance log parser, that extracts resource monitoring data across a wide variety of hardware and software configurations. Secondly, we use our data management system, PerfDB, to load and integrate fine-grained system performance data from approximately 400 experiments. We conduct the first-generation population study of VLRT, and our data support millibottlenecks inducing VLRT through CTQO (Cross-Tier Queue Overflow). We also enable the study of a second latency class called Less Long Requests (LLRs). Finally, we present our ensemble-based, supervised machine learning system, PerfML, that handles data characterized by heterogenous feature space and hierarchical, imbalanced classes—characteristics inherent to the data needed to study millibottlenecks and latency performance bugs. The analytics results from PerfML demonstrate its ability to isolate different kinds of millibottlenecks across a range of systems and configurations with high recall and acceptable precision.



  • Workflow Status:Published
  • Created By:Tatianna Richardson
  • Created:04/21/2021
  • Modified By:Tatianna Richardson
  • Modified:04/21/2021