Testing the capabilities of web application scanners is an ongoing challenge that can be approached in a number of ways; the challenge is to create an objective test that is precise and can be replicated. While previous studies have tested web vulnerability assessment tools, none has statistically tested coverage or vulnerability findings in a precise manner. In this paper I take an approach that allows the data to be quantifiable to distinguish effectiveness (in terms of finding vulnerabilities) between the scanning tools. To do this I employed Fortify’s Tracer product which inserts its hooks into actual production J2EE applications to enable measuring the “application security coverage” and then running each of the three top commercial security scanners against the application. This allowed for actual analysis of how much of an application’s code base was actually executed during a scan which enabled me to look at the scanners in a new and quantifiable way.

Summary

The full results of the testing are going to be analyzed in further detail later in the report, but I would like to start out with some of the conclusions first.

When looking at the results there are a number of ways to look at the data. There are difficulties around analysis of the overall percentages of the “possible” surface/sink coverage because it is hard to determine what subset of the full list is applicable to what a web scanner is expected to test (excluding optional features not enabled, alternative database drivers, import/export data features, etc). For this reason, the numbers become unusable and I decided to remain focused on understanding the capabilities of the web application scanners against each other.

I started by comparing the results of the scanners against each other in the first two categories. It’s interesting to see that the number of links crawled does not always indicate that more of the applications code base is being executed.

It is important that the scanner is able to crawl well, but wasting time on redundant inputs only adds to the crawled links, but does not increase coverage of the applications code base. In our tests, BeyondTrust Digital Securitys’ Retina Web Security Scanner product crawled the most links on average, and had the best coverage in all scans; SpiDynamic’s WebInspect was able to crawl better than WatchFire’s AppScan on one application, but AppScan did slightly better when I looked at the code base it executed. This means that WebInspect wasted time on redundant or otherwise unimportant links.

For this study I focused on verifying the findings found by the scanners for accuracy. Again the lesser known RWSS from BeyondTrust had the most findings, lower false positives and most usable reports to aid in remediation.