Ph.D. Defense of Dissertation: George K. Baah
Title: Statistical Causal Analysis for Fault Localization
George K. Baah
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Friday, June 29th, 2012
Time: 9:30am - 12:00pm
Location: Klaus Advanced Computing Building (KACB) 2100
- Dr. Mary Jean Harrold (Advisor, School of Computer Science, Georgia Institute of Technology)
- Dr. Alessandro Orso (School of Computer Science, Georgia Institute of Technology)
- Dr. Alexander Gray (School of Computational Science and Engineering, Georgia Institute of Technology)
- Dr. Mayur Naik (School of Computer Science, Georgia Institute of Technology)
- Dr. Andy Podgurski (Electrical Engineering and Computer Science, Case Western Reserve University)
The ubiquity of software requires that software developers engineer high-quality software. However, software development is a human process and developers inadvertently introduce faults into software. One of the ways of removing faults from software is debugging, which is a difficult and time-consuming process that is often performed manually. One important subtasks of debugging is fault localization, which is the task of finding the causes of software failures. Fault localization is a laborious and time-consuming task and, as such, many automated techniques have been developed with the goal of reducing the burden on developers during debugging. However, these automated techniques have severe theoretical and practical limitations.
This research makes contributions to the body of knowledge in software engineering that extends the state-of-the-art in fault localization. The research presents the development of a novel probabilistic model that combines program-dependence information with statistical information derived from program executions. This model known as the probabilistic program dependence graph (PPDG) in theory facilitates arbitrary probabilistic reasoning over program behaviors. Based on the insight gained from applying the PPDG to the fault-localization problem, a novel framework that addresses the fault-localization problem from a causal perspective is presented. The causal framework enabled the analysis of the current metrics used in current statistical fault-localization techniques. The analysis helped unify the metrics and also helped show that the metrics are not suitable for fault localization. Several causal models were also instantiated from our causal framework based on different kinds of program-analysis information. The instantiations of the models demonstrated the flexibility that the causal framework provides and also bridges the gap that exists between fault-localization techniques that rely sorely on program-analysis information and techniques that rely sorely on statistical information. Finally, empirical evidence that demonstrates the feasibility and efficacy of this research in helping developers find the causes of software failures is presented.