Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites

Milos Gligoric, Alex Groce, Chaoqiang Zhang, Rohan Sharma, Amin Alipour, and Darko Marinov

About CoCo

An important question in software testing research is what coverage criterion to use when comparing test suites. A traditional criterion C provides a finite set of test requirements for the code under test and measures how many requirements a given test suite satisfies; a test suite that satisfies 100% of the (feasible) requirements is called C-adequate. Rigorous evaluations of coverage criteria have focused mostly on adequate test suites: given two criteria C and C', are C-adequate test suites (on average) more effective than C'-adequate test suites? However, testing practice and research widely use non-adequate test suites because determining which requirements are feasible is hard, generating tests for all feasible requirements is tedious, and some recently used criteria even have an infinite or very large set of requirements.

We present the first extensive study that evaluates coverage criteria for non-adequate test suites: given two criteria C and C', is it better to use C or C' to compare test suites? Namely, if test suites T_1, T_2 ... T_n have coverage values c_1, c_2... c_n for C and c_1', c_2' ... c_n' for C', is it better to compare suites based on c_1, c_2 ... c_n or based on c_1', c_2' ... c_n'. We evaluate several criteria, both traditional (statement and branch) and recently used (path and predicate-complete), on a number of Java and C programs with both manually written and automatically generated tests. Surprisingly, our results show that newer criteria that subsume traditional branch coverage generally perform no better than branch coverage for realistic non-adequate test suites, especially when taking into account the cost of measuring coverage.

Publications

CoCo Tools

Our tools for measuring code coverage are available. Currently, we distribute source code of both the Java and C tools. Note that distributions of our tools include examples and scripts to demonstrate both instrumentation phase and running phase. (We include all small examples used in the experiments.)

Additional Results

Because of the space limit, we could not include all the numbers and plots that may be of interest to the reviewers. The pdf file on the following url very likely will answer any question related to the missing numbers and plots:

Acknowledgments

We thank Yu Lin, Qingzhou Luo, and Shalini Shamasunder for discussions about this work, Mladen Laudanovic and Douglas Simpson for help with statistical analysis, Lingming Zhang for help with Javalanche, Jamie Andrews for valuable comments and providing the C mutation tool, and Fredrik Kjolstad for help with WALA. This material is based upon work partially supported by the National Science Foundation under Grant Nos. CCF-1054876, CNS-0958199, and CCF-0746856.