Caveat Emptor: Is It Really “Data Science”?

Sep 25, 2018
Jeff Aboud

Share with Your Network

In my last blog I walked you through how to determine if a solution is indeed “risk-based.” This time let’s talk about “data science.” Like machine learning and artificial intelligence, the term data science is gaining popularity across the security industry. The vulnerability management space is no exception. It seems like every vendor these days is touting that they use data science to help prioritize vulnerability remediation.

Apparently, data science means different things to different people. Or maybe in their quest to use the latest buzzword to attract potential customers, many marketers are ignoring facts and technical realities. Either way, if you’re in the market for a vulnerability prioritization platform, you owe it to yourself to dig a little deeper when you assess your options. What exactly do they mean—and more importantly, how is their use of data science going to make your life easier?

Define Data Science Please!

Let’s first review what does NOT qualify as data science. Many vendors who make this claim actually just have a group of mid-level security analysts working in the background to assess and score each vulnerability. That’s outsourced security analysis, not data science! And while having much of that work outsourced certainly delivers some degree of value, there are significant problems inherent in that model, as well. First, it can’t scale. If they’ve staffed to handle 100 vulnerabilities per month, but 200 are discovered next month, they can’t simply double their staff to deal with the backlog. Instead, all future vulnerability assessments will be delayed.

But some vendors making the data science claim don’t even do that! For some, it just means that they’re providing a risk score in their tool; well, even basic scanners do that! The question is, how did they arrive at that score? What methodology did they use? Where’s their data model? For far too many, the “model” is tremendously simplistic, primarily using the CVSS score, and then taking some basic network topology into account; and maybe some will include an exploit feed—but it’s only updated once a day, at best.

True Data Science for Risk-Based Vulnerability Remediation

So, what should you be looking for in a solution that truly leverages data science? For starters, it has to be a comprehensive algorithm that concentrates on risk to determine which of your millions of vulnerabilities actually pose the most risk to your organization, and therefore need to be remediated immediately. The flip side of that, of course, is deprioritizing the vast majority of those millions of vulnerabilities that pose little to no risk. That’s what’s going to help your team focus on the most important vulnerabilities, so you can maximize the reduction to your risk posture with the least amount of resources.

A true data science model will certainly consider CVSS, but it also needs to be capable of leveraging a wide range of internal and external security data, including:

  • Scan data
  • Pen testing results
  • Bug bounty programs
  • A database of billions of vulnerabilities and exploit intelligence
  • Multiple threat intelligence feeds—updated and processed in real time


All of the internal data needs to be normalized, deduped, and processed, and then correlated with the external data before running it through the algorithm. That’s a massive amount of data to process—especially considering that the average large enterprise has more than 24 million vulnerabilities across hundreds of thousands of assets. So, suffice to say, the vendor needs to have a great deal of processing power. The model (and the vendor’s infrastructure!) must also have the ability to scale dramatically. Oh, and make sure you also consider how quickly you’ll need this information; weeks, days, or even hours won’t cut it—all of these complex computations need to render accurate results within seconds!

Can Your Security Vendor Do That?

So next time a vulnerability prioritization platform vendor tells you that they use data science to prioritize remediation, drill deeper on that statement. You owe it to yourself and your organization to make sure they have the capability to actually deliver on that promise. After all, the right tool will help you streamline your efforts and make the best use of your limited resources. The wrong tool is just another thing you’ll have to spend time learning and integrating into your workflow with limited benefit. Do you really need another one of those?

Learn more about how Kenna Security uses data science to predict exploits.

Read the Latest Content

Research Reports

Prioritization to Prediction Volume 5: In Search of Assets at Risk

The fifth volume of the P2P series explores the vulnerability risk landscape by looking at how enterprises often view vulnerabilities.

5 Things Every CIO Should Know About Vulnerability Management

If you view vulnerability management (VM) as just a small part of your operation, it might be time to take another look.  Managing vulnerabilities is...



Get Started Using the Exploit Prediction Scoring System (EPSS).

Cyentia Institute’s Chief Data Scientist and Founder Jay Jacobs gives tips on how to get started using the Exploit Prediction Scoring System (EPSS). You...

© 2022 Kenna Security. All Rights Reserved. Privacy Policy.