What’s in a buzzword, like data science? A lot of resentment, for sure, but also a chance to explain.
A buzzword is a word or phrase, new or already existing, that becomes very popular for a period of time. Buzzwords often derive from technical terms. Yet through fashionable use, the original technical meaning disappears, and what’s left is a word or phrase used simply to impress others.
We want to take a minute to add that original technical meaning back into some key terms we (and everyone else these days) like to throw around.
Stand at any booth on the RSA Conference show floor, and you will spot at least four vendors within a 5-meter radius that tout artificial intelligence (AI) or machine learning. As they tend to do, these buzzwords have become less impactful over the last few years, largely due to their ubiquity and the security buyer’s instinct to rebuff anything that smacks of marketing.
AI, machine learning, predictive analytics—these aren’t inherently pedestrian terms. Nor do they warrant being brushed off simply because they get a lot of air time. The problem is that the industry runs rampant with terms that buyers can’t effectively define and vendors can’t effectively defend.
So where does that leave us here at Kenna and the security pros we speak to? Well, we want you to keep us accountable.
Data science at Kenna
If you’ve perused our website, watched any of our webinars, or spoken to any of our employees, you’ll know that there are indeed a couple of “buzzwords” that we consider integral to what we do—data science and machine learning, in particular. In fact, we call out the techniques we use specifically because we want you to understand above all else that we’re not simply spinning marketing claims. We’re using proven approaches rooted in scientific methodologies—and we’re willing to let you put us to the test.
George Akerloff, the less famous but more Nobel Prize-winning husband of the current Secretary of the Treasury, coined the term “Market for Lemons.” He proved mathematically that if buyers have less information than sellers, in any market, the quality of products necessarily suffers. Bruce Schneier and Ian Grigg expanded this notion to the security market, explaining away some of the more amusing qualities of antivirus and firewalls. Well, the way we see it, we have a high-quality product, and to maintain that quality our customers need to know just as much as we do. No silver bullets, black boxes or empty buzzwords.
What exactly are we talking about when we say that our risk-based vulnerability management solutions use data science?
Generally speaking, data science is the confluence of various disciplines—statistics, analytics, mining, programming, etc.—aimed at extracting meaning from a dataset or datasets. The first usage of the word dates back to the 1960s, and it wasn’t until the turn of the century that it became more standardized. The term is demonstrably still young, as is the field of study it defines. But that hasn’t stopped it from demonstrating immense value for businesses on multiple fronts.
In Kenna’s case, data science refers specifically to the techniques we use to sift through a swath of threat and vulnerability data and come to a conclusion about which vulnerabilities are truly the riskiest. We do this using various data science techniques, chief of which is machine learning.
Machine learning, you say?
Machine learning (an enabling component of artificial intelligence) refers to a process in which systems automatically “learn” or adapt based on patterns within data (and without much need for human intervention). Going down one more level, we can single out two types of machine learning: supervised and unsupervised machine learning. With supervised machine learning, the datasets that models are trained on are tagged with labels (i.e. these are pictures of giraffes, these are pictures of moose, etc.). With unsupervised machine learning, datasets are not labeled—instead, the model identifies patterns within the dataset (i.e. these creatures have long necks, these creatures have short necks, etc.).
The difference between these types of “learning” is comparable to ways in which humans can learn. Imagine you see two creatures for the first time. You can be taught by someone that creature A has a long neck and big brown spots and is called a giraffe, and creature B is brown and has big antlers and is called a moose. Alternatively, you can assess the creatures yourself and determine that, based on recurring patterns within their features, you are looking at two distinct sets of creatures.
Both methods of human learning are useful; one may be more applicable to various scenarios than others. Similarly, both methods of machine learning are functional—the best approach depends on the problem you’re trying to solve and the data that you’re working with. (Note: Supervised and unsupervised machine learning can be subcategorized further, but we won’t get that granular today).
Cloudy with a chance of exploits
When we talk about machine learning here at Kenna, we’re typically referring to supervised machine learning, and it’s what enables us to predict the weaponization of vulnerabilities. Our predictive modeling leverages supervised machine learning to analyze and learn from a large quantity and breadth of real-world data about vulnerabilities—what’s in the NVD and MITRE, which exploit kits are available, which CVEs been successfully exploited in the wild (and how many times), etc.—and create high-fidelity forecasts of exploitations.
At a basic level, you could think about this in terms of weather forecasting, because in a way we use a similar approach. We all like to berate our local meteorologists, but weather forecasts are significantly more accurate than they were a few decades ago, thanks in large part to modern data science techniques. Using machine learning models and a combination of historical data and current weather data, accurate forecasts can be made for days if not weeks in advance of a specific date. We likewise use historical and current vulnerability and exploit data to make predictions about the likelihood of future exploitations.
A data problem
Weather models or CVE exploitation predictions are only as effective as the data that you’re training with. If you don’t have data that can help you understand the typical barometric pressure and wind conditions of past hurricanes, as well as the current barometric pressure and wind conditions of a given location, it would be impossible to accurately predict when a hurricane will take place in that spot. Likewise, if we didn’t fully understand the characteristics of a given CVE, or if we can’t assess whether or not that CVE has been successfully exploited, when, how many times, etc., our exploit predictions would be inefficient.
Data science (and AI and ML and the like) is, after all, based on data. And here we’ve tapped on the core of why our industry has become such a hotbed for these techniques. Security is flush to the brim with data. In vulnerability management, in particular, data deluge is a recurring problem, and this is where Kenna has taken a distinct approach from the get-go. While VM vendors like scanners have historically just spawned more data without guidance, our mission has been to make sense of the data that businesses already have and help them make truly efficient remediation decisions based on that data. In a very real sense, Kenna Security is a data science company; modern vulnerability management is how we apply our expertise.
Of course, there’s a lot more under the hood of our data science than what we’ve discussed here. Future blogs will explore the topic in more detail.
So, next time you see us, ask us about our data science. We’re always happy to peel back the curtain and offer you as deep a dive as you like. Reach out anytime.