Blog

The Concept of Measuring Recall in Cybersecurity

In a recent article for the USENIX magazine, In-Q-Tel CISO Dan Geer and Kenna Chief Data Scientist Michael Roytman discuss the importance of measuring the concept of recall in cybersecurity. If you’re not familiar with the term, or can’t quite “recall” what it means, it is one of the two classic measures – along with precision – for assessing relevance in search problems.

Source: https://en.wikipedia.org/wiki/File:Precisionrecall.svg
Precision and Recall

These fancy terms arose from a need to go beyond the simple notion of accuracy in determining how well search results performed. Anyone who has used a search engine has intuitively assessed precision and recall in some manner. Some suggested content fits what you’re looking for and some do not. And there’s probably a lot more out there that was missed entirely. Apply that example to the figure on the left, and you’ll grok more about these concepts than most.

Precision captures what proportion of search results are relevant to your interest, while recall shows what proportion of all relevant content on the Internet was retrieved. Said more colloquially, precision asks “So what?” and recall asks “What’d I miss?”

This is a blog for a security company, so let’s take this back to familiar ground. The authors are helpful here: “Security tools mostly deal with answering some form of the question, “Does this matter?” In vulnerability management, that question is, “Does this vulnerability pose a risk?” In incident response, that question is, “Was this a malicious event or a false positive?” In threat intelligence, it can be said as, “Is this indicator malicious or not?” They continue: “Accuracy, in the technical sense, has little meaning when searching for rare events. If one in a hundred machines is infected, I am 99% accurate when I routinely guess that “none of our machines are infected.” Hence, we turn to measures of recall and precision when evaluating how “good” we are as an industry at answering these questions.” They go on to give many excellent reasons as to why and how the measurement of recall can help us mature as an industry.

This, in a nutshell, was the theoretical basis for our research with the Cyentia Institute leading to the publication of our Prioritization to Prediction report. We studied 100,000 security vulnerabilities to identify key determinants of exploitation and build a predictive model that objectively measures remediation efforts. You can probably guess at this point that precision and recall were the measures we used for this research. At the risk of encouraging you to skip to the end of the report, here’s the grand finale data visualization:

Fig. 13 from the Kenna Security and Cyentia Institute Prioritization to Prediction report

 

The figure compares all remediation strategies assessed in the report (you’ll have to read it for the full story) on the same chart based on the level of coverage (recall) and efficiency (precision) achieved by each one. The points on the plot represent the strategies and the size of those points corresponds to the total number of CVEs remediated by that strategy. Objectively determining which strategy performs “better” is a matter of finding the ideal balance between the measures of precision and recall. For instance, a common approach is to remediate any vulnerability receiving a CVSS score of 7 or above. Based on historical data, this would achieve a precision of 32% and recall of 53%; not bad.

The “Balanced” predictive model we developed in the report, by comparison, yields a precision of 61% and recall of 62%. In other words, our model achieves twice the efficiency (61% vs. 31%), half the effort (19K vs. 37K CVEs), and one-third the false positives (7K vs. 25K CVEs) at a better level of coverage (62% vs. 53%)!

It performs 8X more efficiently than a strategy based on remediating vulnerabilities from the top 20 vendors (lower left of the figure). We can’t go into detail here, but it does this by incorporating multiple different types of data points and accounting for interaction effects. This “Everything” model we developed outperforms all other strategies, regardless of whether your goal is to maximize coverage, efficiency, or balance both.

Our point in all of this is not to pound our chest and shout “our model is better than yours!” (though we are proud of it). The point is that important security questions like “what’s the best vulnerability remediation strategy?” don’t have to be a guessing game. We can and should be measuring these things and, in so doing, dramatically improving our ability to confidently manage risk in our environments.

We would end this by wishing you good luck in that endeavor, but luck is for those who can’t measure. Instead, we’ll wish you good recall.