PhD Defense by Richard Rutledge
Title: On the Use of Over-Approximate Analysis in Support of Software Development and Testing
Date: Wednesday, 28 Sept 2022
Time: 15:30 am - 17:00 pm ET
Location (virtual): https://gatech.zoom.us/j/96131130764
School of Computer Science
Georgia Institute of Technology
Dr. Alessandro Orso (Advisor) - School of Computer Science, Georgia Institute of Technology
Dr. Milos Prvulovic - School of Computer Science, Georgia Institute of Technology
Dr. Qirun Zhang - School of Computer Science, Georgia Institute of Technology
Dr. Vivek Sarkar - Chair, School of Computer Science, Georgia Institute of Technology
Dr. Spencer Rugaber - College of Computing, Georgia Institute of Technology
Dr. Marcelo d'Amorim - Computer Science Department, Federal University of Pernambuco (UFPE), Recife, Brazil
The effectiveness of dynamic program analyses, such as profiling and memory-leak detection, crucially depend on the coverage of the test inputs. However, adequate sets of inputs are rarely available. Existing automated input generation techniques can help but tend to be either too expensive or ineffective. For example, traditional symbolic execution scales poorly to real-world programs and random input generation may never reach deep states within the program.
For scalable, effective automated input generation that can better support dynamic analysis, I propose an approach that extends traditional symbolic execution by targeting increasingly small fragments of a program. The approach starts by generating inputs for the whole program and progressively introduces additional unconstrained state until it reaches a given program coverage objective. This approach is applicable to any client dynamic analysis requiring high coverage that is also tolerant of over-approximated program behavior--behavior that cannot occur on a complete execution. To assess the effectiveness of this approach, I applied it to two client techniques. The first technique infers the actual path taken by a program execution by observing the CPU's electromagnetic emanations and requires inputs to train a model that can recognize sub-paths. The second technique performs automated regression testing by identifying behavioral differences between two program versions and requires inputs to perform differential testing.
Input generation by symbolic execution can also be hampered by unsupported solver theories. For example, state-of-the-art solvers such as KLEE concretize floating-point (FP) values upon access to avoid FP expressions in the path constraint, which are unsupported by most SMT solvers and carry high overhead by the few (e.g. Z3) that do. I will further present my preliminary work to transform FP expressions to fixed-point (FXP), significantly improving coverage over vanilla KLEE.
Finally, I will also discuss future research directions, including additional empirical evaluations and the investigation of additional client analyses that could benefit from both approaches.