Share with Your Network
We discuss the challenges managing risk in 3rd party code from things like Open Source Software libraries.
Transcript of the pod
DAN Today on Security Science, managing third party code risk. Hello, and thanks for joining us today as we discuss the surprisingly, well, at least to me, challenges with managing risk in software code and particularly the third party code in things like open source software libraries, GitHub, all that good stuff. So, I have almost zero experience with this so explaining this highly complex landscape to me today is Kenna Security’s very own Sultan of security, Jerry Gamblin. How’s it going, Jerry?
JERRY GAMBLIN It’s going well, Dan. It’s not that hard to understand, and we’ll walk through this and get everybody kind of up to speed on what third- party code risk looks like and why it should be an issue for everyone.
DAN Awesome. Well, I want to just mention this. So, I will link specifically to this and hopefully link to it in any social promotion. But Jerry shared this always pertinent, XKCD little cartoon image on software dependencies, and it’s just the best. So anyway, it’s xkcd.com/ 2347 if you want to check it out right now. So I encourage you go look at it and it kind of summarizes the entire topic. So jumping in, I just thought it would be apt to start with the famous Marc Andreessen quote from 2011, he wrote this whole wall street journal byline on it, but it was that software is eating the world. And it’s been almost 10 years since he said that. And I think everyone will say that’s pretty true. It seems the rise of Facebook, all the social media platforms, influencing elections, all that fun stuff, it seems software and code in particular has just expanded to be the primary export and area of development for the US and the greater world. So, Jerry, what are some of the challenges that have created this whole situation in the last 10 years?
JERRY GAMBLIN I think what Marc missed there, or what he implied was that software that individual companies wrote would be, would take over the world. In reality, that isn’t what’s happening in most organizations today or anybody who’s writing code. One of my favorite examples of this is that a super popular json formatter in Rails has over 30,000 lines of code when you import the gem So you own port one gym to make sure that your json is validated. And you bring in 30,000 lines of code that someone else has written to do that into your code base. I think that people used to stay up with nightmares of somebody hacking into their firewall. I think that now, most CISOs and security people stay up in wondering what all their dependencies look like. I think we’ll talk about an article and a report later that says that the average developer supports 100 times more code than you did when Marc wrote that quote. And it’s not code that they’re writing.
DAN That’s an interesting. So, you’re referencing, and we’ll have a link to all of these sources as well, but it was an Ars Technical article and they were talking about a survey they did, the company was Sourcegraph. And so I think it was 51% of the respondents said that they were managing at least 100 times more code. And I believe half of those guys even said 500 times more code than they did in 2010. So we will definitely link to that. And that’s just a staggering number. And what you’re saying I think the point is that, they’re not writing necessarily all this code. This is a lot of stuff that’s freely available, open source. It’s used to kind of cobble together solution right over time, and people have to manage this. So could you just back up a little bit and explain what’s a dependency in software?
JERRY GAMBLIN A dependency is something that software needs to run. It’s better to think of these as kind of like toolkits. If you go to, I think it’s gems. ruby. com, you can type in, it’s their gem search engines. Gems are little add ins that do certain things. You can say, I need something to validate a date and you’ll have a date validation gem or a json validation gem. Or a gem that can talk to AWS. And all of these gems have a ton of code that is hard for a company to audit before they put it in. Like you’re just trying to go fast. So, I go and I grab this gem and I throw it in my software stack. And now, I’ve inherited all the code that’s written in there, plus all the dependencies that that person used to build their gem down, down, down. So that’s how you go from something that should be super simple to being 20, 000 lines of code.
DAN So, it’s essentially the concept of not needing to recreate the wheel, right? You need to accomplish X thing. Someone’s likely already done that somewhere within the code language that you’re looking at, and you should be able to pull that in theory for most things. So it’s a tool to achieve a specific goal.
JERRY GAMBLIN It’s a tool that you’re paying nothing for and that becomes a key component of your company and how they operate. So we’re talking about security here, but it’s also just as pertinent to mention performance and logging and stuff. So these might be slow and the problem might be like that XKCD article shows, that a slowness as in something three layers down in your stack that somebody Nebraska manages on their free time and it’s slowing down your billion dollar company. But you don’t know that software. So you hope that this person after they get done farming or whatever they do in Nebraska, comes in and makes that update that you need so that your company can perform.
DAN And then you also are, I guess, have to deal with some of the risks that could be inherent to some of that code that is three generations down built on top of different pieces as well.
JERRY GAMBLIN There is a political aspect to it, but earlier this year, somebody pooled down a bunch of very popular chefs open source repo that they had owned, that they worked on it and the chef company did a deal with department of Homeland Security or something and they didn’t like it. So they pooled down these repos that had millions of pools and everybody was using it to makeshift work, and it broke a bunch of people right away. And that’s something to think about when you use open source software. If this disappeared tomorrow, what would you do? And that was a big kind of eye- opener on what it means to use open source software in your environment that you might not have a copy of or control over.
DAN Gotcha. Well that same Ars article also says that in modern development, and particularly web development, and it generally is amalgamations of different platforms, libraries, and dependencies. And then the developers that Sourcegraph surveyed reported that they had increased the number by significant amount. How much would you say if you had to guesstimate most platforms are built on? Most companies, how much do you think that their software is built on open source code or library, things that weren’t actually hand- coded by the company’s devs itself?
JERRY GAMBLIN Yeah. A good number is that most apps are 95% prodding code. Like on average, 95% is background code, gems that they brought in to make it run.
DAN That is crazy to me. And that’s why I brought up the surprising in my intro because I had no idea. It makes sense when you really think about it, but 95%, that seems like a significantly high number overall.
JERRY GAMBLIN It is, but you want people to move fast, and when they move fast, they will take what’s available for free and what’s popular versus writing it themselves. And when open source software has a good support system and inaudible around it, it works fine. It’s the edge cases where you’re using a not popular gem that somebody sneaks a back door into that that always ends up biting people. So, everybody is going to use open source libraries. You just really have to, as a security professional, or as a software development manager, come up with some hard rules on like what makes a good open source package for your company. Does it look maintained? When was the last update? If I’m writing software and I find a gem that does what I want it to do, 100% of what I want it to do, but it doesn’t look like it’s been updated in like 12 months, two years versus one that does 80% of what I need and it’s constantly updating, you should use the one that’s still getting updates as it’s probably better long- term for your company and may not be abandoned.
DAN That makes a ton of sense as well. But it seems like this is mostly risen over the course of last 10 to 20 years. So I don’t know if that’s accurate. Have we seen a shift over time in use of kind of these open source libraries of things over the course of web development for example?
JERRY GAMBLIN Oh yeah. Open source libraries, it’s a different development world. I’m not a professional developer but when I was in college and you took introduction to programming or C +, you would include libraries that were built into the C library. There is no go out and grab this library from this website. So, everything was pretty well self- contained, but that’s changed so much these days where it’s go pull this library. You want to do machine learning, go get the TensorFlow library. And there’s no way that you’re going to write enough code to offset the code that you’re using from TensorFlow. We’re all kind of part of this giant open source community and most people don’t stop to think about it. I’d be remissed to do one of these podcasts and not talk a little bit about car hacking, but next time you get into your car, look at your infotainment center. Somewhere in there will be a license that shows all the open source software that Tesla or Chrysler or Ford uses. And it’s usually a big, long thing that you have to scroll through because all those companies did the same thing. Instead of going and writing their own software, they went and pooled off the shelf utilities that’s probably a normal Linux distribution and used that in the car.
DAN Oh man, licensing. I didn’t even think about that. That sounds like a nightmare, just maintaining like number of different licenses, you probably need to just share, right?
JERRY GAMBLIN That is also a big deal because some of the licenses, and I’m not a licensing expert. There are about 15 or 20 different major kinds of open source license. And some of them are all the way from the do whatever you want with it here, and that’s fine. But some of them say, if you make changes to this library, you have to push the changes back into the open source. How many people are respecting that license? They make a change to do what they want but they don’t open inaudible, right?
DAN Yeah.
JERRY GAMBLINIt’s a big deal. And on the legal and compliance side, they want to know that too. That’s a common question if you’re a startup is give me a list of all the software you use to build your software so that we can make sure you don’t have anything in here that says this company now owns 10% of it because you use their intelligence or whatever their software in my stack.
DAN Oh man. Almost all of them too just require to your point about the infotainment systems that at a bare minimum, you’re listing out. Hey, we use this, so they’re getting credit for the work that they’ve done, which they should. Just sounds like managing that kind of complexity especially from a legal and compliance standpoint sounds like a nightmare that I don’t ever want to have to deal with.
JERRY GAMBLIN I’ll let you know next time we go through an audit and I’ll let you sit through that with us.DANI just literally asked you not to do that, please.
JERRY GAMBLIN I know. But everybody needs to see how painful it is at least once in their life.
DAN Absolutely. Well, that’s why we got you on the line so you can share that with the world. Oh man. Well, jumping over too, I’m curious because you shared a piece on scrum. org, which I thought was really interesting as well, which talked about kind of the development teams, how they’re structured and how that may play into some of the complexities and the challenges, because some teams may be only backend. They rely on X team to provide, the API team to provide API hooks, and those may have different dependencies that they have no idea exist because they only have that connection point via their workflow. So, just any takeaways from how do we get here, how the org structure influence how some of this goes and how you can think about it?
JERRY GAMBLIN The best way is to start over, but nobody can do that, right?
DAN It’s like zero trust, just break it all down, build it back up from the bottom, keeping this in mind.
JERRY GAMBLIN You really have to start from the very bottom to make this work. And going back and retooling it is nearly impossible, because like you said, once you get technical debt built in and libraries that people like to use or whatever, removing those libraries is super hard. It’s kind of like replacing the foundation on a house. You can do it, but it’s not fun and you’ll do nearly anything you can to not have to do that.DANYeah, yeah. And it has a high chance of breaking things, right?
JERRY GAMBLIN It’s going to break something and it’s not going to be right. So, that’s why you want to, if at all possible, you want to start early and have a good firm foundation on what you allow to be brought in from open source from outside, right? But the agile and the scrum methodology doesn’t give a lot of credit to people kind of stopping and thinking about this. They’re like, how fast can we get this out? And I admit it too, I’m not a developer, but when I want to find something, I go and look for a gem or a software somebody else has already written that I can pull in as library and use just because it’s super handy and super fast. Very rarely does anybody think about what is this library do? Who wrote the code? Is it maintained well? Those questions that you should ask that aren’t great. There are companies out there who are trying to help with like an open source library score card kind of thing. Help these companies know, okay, this library is allowed, this library isn’t allowed.
DAN So some of the benefits, it seems to me, it’s speed, you don’t have to rewrite code, you can pull in these kinds of specialized toolkits to achieve a specific task that you may or may not have experience in. I think the TensorFlow piece, most people are not very familiar in coding for machine learning possibly or AI. And so you bring in these highly specialized skill sets that you may not necessarily need or have on staff. What are some of the ways to I guess combat some of the challenges with using these kinds of code bases?
JERRY GAMBLIN You really have to just make sure you understand what it does and how it’s supported. And who it was written for is also a big thing to understand. What was the use case when this was designed? A great way to do that is if you’re looking for open source software, look for startups that have written it. Look for somebody kind of, if at all possible, for somebody in your vertical or kind of the same size, because they probably have some of the same kind of restraints and what the same kind of performance you do. What you don’t want to do is need a file processor software. And you find something at the University of Missouri written and open sourced, but it only does a million lines an hour, which is fine for the university of Missouri because they’re academic and they don’t need it. Speed isn’t super important there. But you need to do at least five million lines an hour in your processing. So you need to go and try to find somebody who has that same use case for the software as you do, or better yet, if you can’t, when you rewrite it or you tweak that software, release it back out and say, hey, here was our use case, we needed more performance so we switched out these libraries. And that’s how open- source software gets better.
DAN Yeah. And contribute back to the code base.
JERRY GAMBLIN Yeah. Nice.
DANWhat are some of the other considerations? I know you’ve covered them throughout the course but if you were going to go pull in some kind of external code and you aren’t pushing for time, you want to do your due diligence, what are the steps you’d take when you’re looking at the code you want to pull in?
JERRY GAMBLIN I would see if there’s any open vulnerabilities. I always am a big fan of going in and looking at the GitHub issues and kind of seeing the kind of people you’re working with, see how they respond and the issues and in the PRS, because at the end of the day, these people become your” coworkers.” You’re using their software in your software. So if you’re having an issue or something breaks and you want to, you need to ask them a question and either an issue or a GitHub PR, you want to have somebody who at least looks semi- friendly and not combative or dismissive in those PRs.
DAN So that guy in Nebraska is very responsive to the code he’s been maintaining for fun since 03.
JERRY GAMBLIN He’s probably super nice Nebraska guy. They’re like, oh, you need this to do this. Okay. He stays up all night and he writes it for a year. Versus a tech bro who’s like, dude, that’s not even what this is used for. Well, get off my site.DANOh man, I totally just saw a prototypical tech bro when you were doing that. What about the source of the code from like different languages? Does that pose some challenges? If the guy’s Croatian, for example, and you’re trying to chat with him, he’s not very English native, something like that?
JERRY GAMBLIN For sure. You need to make sure, yeah, that you can have those conversations. Even where the code comes from. Croatia is not on any list but you can’t use code from Iran. If there’s somebody really smart in Iran that has some great code on GitHub, it’s probably not legal for a US company to put that in their stack because of import export laws, right?
DAN Yeah. Okay.
JERRY GAMBLIN It’s stuff like that that make lawyers rich and security guys pull their hair out. What do you mean we can’t use this library because it comes from this country that we don’t like this week. So now I have to go have that fight with my dev team.
DAN The layers of complexity are just staggering to this challenge. I guess that’s a good segue. What are some of the solutions to managing this issue? I know there’s some built- in stuff for like GitHub, for example, you have some really good advice for doing some homework, seeing who worked on this and are they responsive and can you get ahold of them? Are there any other capabilities that are kind of built in or that you would see as a baseline as necessary?
JERRY GAMBLIN I think having a good policy, a good written out policy that states what you need to have before you allow a new gem or a new dependency to be brought into your stack is at a minimum. Some of the best companies I know have kind of a library review committee, which sounds boring, and it probably is, but it’s four or five of their senior developers and somebody comes and says, hey, I need to use this library to do this. It’s not in our stack yet. We want to use it. And they can spend time and use the wisdom of crowd to come to a consensus, we need to use it. Or if, hey, no, this is built in and you just need to do that, because a lot of times it is. You pull in a library because you’re not as good of a developer as you should be maybe. I’ve been guilty of that, bringing in a library to do something simple that four or five lines of code that I just couldn’t write could do the same thing. But in terms of speed and stuff, bringing in 20,000 lines to someone else’s code was the way I picked it.
DAN Interesting. So the Jedi council of external code review. That sounds like a good process, but are there any kind of capabilities you think are necessary in built in like GitHub, I know they’ve been pushing on this a lot. Java library, stuff like that.
JERRY GAMBLIN You need to run a software where an SCA. It just tells you everything that’s in your system and if it’s up to date and if it has known vulnerabilities. It’s very, very important to be able to keep on top of what makes up your stack.
DAN And then how do you go about addressing vulnerabilities after you run that SCA process?
JERRY GAMBLIN It’s not easy and your QA team has to test. It’s regression testing most of the time because you have to say, here’s what this library did, here’s what it changes. It shouldn’t break anything. But as we all know, you don’t know until you know. So you’ve got to update it. If everything looks good, then you have to try some more testing. And then you can roll it out. It’s not as easy as just updating everything and going right, that’s how you know the difference between somebody who spent some time in software development and someone who’s come from traditional VM management, because the traditional VM is like, oh, there’s a patch out, just install it. What’s the worst that can happen in roll back. And expect a patch to be deployed in 48 hours, 96 hours or whatever. But when you get down to software development, updating one vulnerability can take a pair like two developers, a full two week sprint to walk through and to do that. And when that’s 90% of your code, most companies have full- time teams just devoted to keeping their dependencies and their software up to date.
DAN Yeah, that makes sense, and why things like Apache Struts and stuff like that can sit unpatched for so long, right?
JERRY GAMBLIN Yeah.
DAN Interesting. Just to summarize real quick, it sounds like a good process is to one, kind of identify some guidelines at the outset for your devs. So they can understand what generally is appropriate, what to look for that would be a bonus for pulling stuff in, red flags to look at, why you might not want to consider pulling in specific external libraries in your code. And then from there, possibly having a nice review team that can sit out and their job is to look at, hey, do we need this, can we do it in internally, do we need to do this externally? Provide a little bit of counsel from the high level. Hopefully not bringing in legal teams in the code review too often. And then test QA check on the backend for the vulnerability management/ patching side of things. That sound about right?
JERRY GAMBLIN That’s it. Look, you’re a professional now. You can go and start a job as a software manager.
DAN Oh please, no. Please, no. I got the Jerry Gamblin crash course.
JERRY GAMBLIN It’s a hard problem to solve. We just touched on what it looks like in a normal day. All kinds of stuff. What happens if somebody puts a back door into a super popular platform. That’s one of the things that people worry about. There’s all these gems and this stuff that like one day you wake up and it comes to find out that your favorite bad actor put a back door in the most popular Ruby gem. And it’s been there for six months. And now every company that runs Ruby in the world is now owned, because nobody’s spent the time to dig through all that code and to make sure that it was all legit.
DANWow. How often does something like that happen?
JERRY GAMBLIN We have not seen it on a top tier, like a top 10% gem. But we see it all the time on lower level gems that had 100, 150 installs. Typosquatting is a big thing too. Devise is the big Rails authentication gem. Somebody set up a fake one called Revise. One little typo brings in code that works 99.9% rhe same as Devise-
DAN Except with.
JERRY GAMBLIN A backdoor.
DAN An added bonus of insecurity. So it’s kind of like URL spoofing almost in a way.
JERRY GAMBLIN Exactly.
DAN Interesting. And just while we’re starting to think of some of the future dev stuff, container, serverless, stuff like that, does that change this process or what to think about?
JERRY GAMBLIN It changes who owns it. At the end of the day, when you’re talking about serverless, you’re talking about one of the big four cloud companies owning that infrastructure and building those servers. So you hope at some point that, and you know they are, they’re taking this very serious, making sure they’re not bringing in bad software in. And then it does get down to more what you saw in 2010 where developers might write 50% of the code they’re running because all the serverless software is being handled by someone else.
DAN Sounds like that might be a good way to go in the future as things start to develop out. Maybe.
JERRY GAMBLIN We’ll see.
DAN Depends how much you want to tie yourself to say an Amazon, Google, Microsoft, etc.
JERRY GAMBLIN Exactly.
DAN Cool. All right. That seems like a really good overview. Is there any other final thoughts you wanted to give on the podcast?
JERRY GAMBLIN No, no, thanks for having me. This has been great as always.
DAN Always. Okay, cool. Well, just a quick reminder that I will link all of the articles, particularly the XKCD piece, but Ars Technical piece, the scrum. org. And then we also have a couple links for things like GitHub where they talk about some of the dependencies and how to manage those within that environment. But in the interim, thanks Jerry for joining us and everyone have a nice day.