New publication: Finding Programming Errors Earlier by Evaluating Runtime Monitors Ahead-of-Time
Eric | July 31, 2008I am happy to announce the final version of our new FSE paper (joint work with Patrick Lam and my supervisor Laurie Hendren). You can grab the paper here. The idea of the paper is that runtime monitoring is nice because it manages to show you only actual errors, but nevertheless one should make a best effort to evaluate a runtime monitor ahead-of-time, i.e. at compile-time, as well as possible, so that programmers can find errors in the programs earlier in the development process.
The problem with such an approach is that it can potentially yield a lot of false positives. We therefore extend our flow-insensitive analysis from earlier work with a new flow-sensitive checker that removes even more false positives. We furthermore add a ranking and filtering stage that weeds out almost all remaining false positives based on a machine learning algorithm (implemented in Weka) – see the Figure to the right. In result, the analysis indeed reports virtually all true positives (i.e. actual programming errors), and virtually only those, already at compile time.
I think the idea of using machine learning to weed out “bad results” of static analyses can make sense in general. So that’s maybe something you want to take away form this.
Another interesting point about the paper is that the flow-sensitive stage is intraprocedural. To quote from the paper:
The most important finding of the methods presented in this paper is that we can successfully reason about tracematch states by combining inexpensive whole-program summary information with a suite of carefully-designed intraprocedural flow-sensitive analyses […]. Our abstraction associates two kinds of information with each state: positive information (an object may be in a particular state) and negative information (an object is not in a state). In our benchmarks, the combination of positive and negative information enables us to eliminate most false positives.
The fact that the analysis is intraprocedural means that it is maybe not as precise as it could be (yet often precise enough), but on the other hand it is a lot faster compared to fully interprocedural approaches.
At compile time we model runtime objects through object representatives, a novel abstraction that almost transparently replaces runtime objects by compile-time objects. Object representatives have an identity, just as runtime objects, defined by a flow-sensitive must-alias analysis, and they support must-not-alias queries, resolved through a combination of an intraprocedural flow-sensitive must-not-alias analysis and an interprocedural flow-insensitive context-sensitive points-to analysis. If you are interested in finding out more about object representatives, check out this paper. The paper will be published at BCS 2008 (more to come).
To find out more about our FSE paper and the experiments conducted for the paper, have a look at the abc website.