Combining Hardware and Software Instrumentation
 


In recent years, numerous researchers have proposed data-driven techniques to improve the quality of deployed software. At a high level these techniques typically follow the general approach of lightly instrumenting the software system, monitoring its execution, analyzing the resulting data (often called program spectra), and then acting on the analysis results. Some example applications of this general approach include identifying likely fault locations, anticipating resource exhaustion, and categorizing crash reports as instances of previously reported bugs.


A fundamental assumption of these and similar approaches is that there are identifiable and repeatable patterns in the behavior of successful and failed executions and that similarities and deviations from these patterns are highly correlated with the presence or absence of failures. Previous efforts, in fact, appear to support this assumption, successfully applying a variety of program spectra to a variety of QA activities.


One less well-understood issue, however, is how best to limit the total costs of implementing these approaches and whether and how tradeoffs can be made between cost and analysis accuracy. This issue is important because these approaches have been targeted at deployed software systems, excessive runtime overhead is generally undesirable. Therefore, it is important to limit instrumentation overhead as much as possible while still supporting the highest levels of analysis accuracy.


In general, previous efforts have tended to either ignore this problem or have appealed to various sampling strategies for a solution. One potential drawback of sampling however is that aggressive sampling schemes greatly increase the number of observations that must be made in order to have confidence in the data.


While we believe that sampling can be a powerful tool, we also conjecture that large cost reductions may derive from reducing the cost of the measurement instruments themselves. In this project, we have been developing and empirically evaluating techniques, algorithms, and frameworks,  in which most of the data collection work is performed by fast hardware performance counters. The data is augmented with further data collected by a minimal amount of software instrumentation that is added to the system's software. Furthermore, we have been contrasting this approach with other approaches implemented purely in hardware or purely in software.


Hardware performance counters are hardware-resident counters that record various events occurring on a processor. Today's general-purpose CPUs include a fair number of such counters, which are capable of recording events, such as the number of instructions executed, the number of branches taken, the number of cache hits and misses experienced, etc. To activate these counters, programs issue instructions indicating the type of event to be counted and the physical counter to be used. Once activated, hardware counters count events of interest and store the counts in a set of special purpose registers. These registers can also be read and reset programmatically at runtime.


Hardware performance counters have been traditionally leveraged to perform low-level performance analysis and tuning of software systems. We on the other hand uses them in a

novel way, exploiting them for functional correctness evaluation.


We have so far used hardware performance counters-based hybrid spectra to perform various QA activities, such as fault detection, fault characterization, fault isolation, and fault prediction.