What is Software Composition Analysis
Fundamentally a SCA solution has two main parts: a catalog of known software packages and a scanner that combs through your build artifacts looking for items from its catalog. The catalog contains more than just the product names for each package. It also knows about published versions, vulnerabilities in each version, and the kind of license required, such as Apache 2.0, MIT, or GPL. When you run a SCA solution it examines your files and produces a list of the third-party products you are using. For each product the scanner reports the product provider, name, version, license type, and a list of vulnerabilities that have been discovered in the product.
Why You Need SCA
SCA solutions should recognize most third-party components, whether commercial or open source, but generally SCA solutions have become popular because they address the risks involved with open source software. Most companies use at least some open source software, and most applications contain at least some vulnerabilities in the open source software they use. Furthermore, many open source products form part of a complex ecosystem composed of multiple distinct components or packages. Companies using Linux, Docker, Apache, or Node.js, for example, very likely have hundreds or even thousands of distinct packages, each with its own licensing requirements.
Even though the consequences of using vulnerable software or the wrong kind of license can be drastic, many companies don’t even know what third-party software they have.
While it is technically possible to track all those components manually, keeping the catalog up to date quickly becomes prohibitively tedious. Tracking every new package you include is hard enough in a large code base. It gets worse when you try to add to your inventory the version and license for each package. Packages often come with their own sub-dependencies, which extends the list. And tracking known vulnerabilities is worse. You could certainly find vulnerabilities in the public CVE database at Mitre, but that means manually searching and evaluating results for every item in your inventory. And searching once isn’t enough. What if someone discovers a new vulnerability in old software? You’ll have to perform the search again periodically to see if newly found vulnerabilities might affect you.
This work can obviously be automated. And vendors who specialize in providing SCA services can get better results than you might on your own. They might, for example, supplement their lists of known vulnerabilities with data from proprietary sources or research.
How SCA Works
To run a SCA scanner you point it at your build files. They might be on a developer desktop or a staging server, but most often SCA scanners read from a build directory in your CI/CD pipeline.
SCA solutions recognize files in your code base that come from third-party products. Scanners may employ multiple tactics for identification. For example, they may have a list of hashes pre-computed from files in known software products. The hash for every file is unique. When the scanner runs it calculates hashes for all the files in your software and matches them against its list. When the hashes match the scanner knows what product and what version of that product you have. Additionally, many scanners can parse source files to find proprietary code snippets incorporated within your own code.
Even if your code never changes, you may get different results each time you run the scanner. That’s because SCA scanners frequently update their list of known vulnerabilities. People continue to find flaws in software for years after release, and a good scanner updates its list of known vulnerabilities frequently.
What Does SCA Find?
SCA solutions produce several different kinds of output that serve different needs and may be consumed by different audiences. Typical SCA reports include:
Bill of Materials (BoM)An inventory of third-party packages found. Knowing what software you have is a first step to security, and providing a list is sometimes a compliance requirement.
List of LicensesAn inventory of software licenses associated with third-party components in your software. Some open source licenses are highly restrictive and can create business risk. Legal departments often set policies about what licenses a particular company must avoid.
Known VulnerabilitiesA list of potentially dangerous flaws in your third-party software components. At a minimum such lists show the type and severity of each vulnerability as well as which files have the vulnerability.
Like most testing tools, a SCA scanner helps you find problems. Responding to what it finds may present some common challenges.
Assessing Actual Risk
Knowing that a particular library in your back end has a certain high-priority vulnerability is not always enough for development teams to choose the right response. Which team owns that library? What parts of the product depend on it? How much regression testing will be needed if it’s replaced? What exactly is the vulnerability? Is the vulnerable code path something your product ever executes? Can you safely ignore the vulnerability? These questions are not always easy to answer. When choosing a SCA solution, look carefully at the vulnerability reports. Which one gives the best information?
You’ll need to decide who is allowed to determine that a particular finding does not need to be fixed. Requiring approval from the security team may produce reliably objective decisions, but it creates a roadblock for developers. Allowing developers to accept risk speeds code development but introduces a need to monitor or audit developer overrides.
Addressing Technical Debt
If you have a large code base and have not been tracking your third-party software, your first SCA scan will likely reveal an alarming technical debt. You may have hundreds of items in your vulnerability list. Prioritizing the work will be essential. Which vulnerabilities are most severe? Which components actually process sensitive data? Which have the most vulnerabilities? Which are easiest or safest to upgrade? Divide the work into prioritized chunks. Allocate a set amount of time in each sprint to address the findings. Celebrate a new victory after each chunk is fixed.
Knowing What Was Missed
SCA scanners may not identify every single third party component in your product. They may fail to recognize specialized libraries purchased from smaller vendors or open source files that are not widely adopted. Some amount of manual tracking may still be necessary.
Popular SCA Solutions
Many companies produce Software Composition Analysis scanners. Here are some of the larger or more prominent contenders, along with a word or two about each to suggest how they differ.
Black Duck (by Synopsis) boasts a knowledge base of over 4 million software components.
GitHub’s dependency review feature and GitLab’s “Dependency Scanning” both generate alerts for out-of-date dependencies in public repositories and create pull requests to update your code. GitHub’s dependency review is in beta as of this writing (4/27/21) GitHub provides a feature comparison here.
JFrog’s Xray is available in cloud or self-hosted versions and integrates particularly well with JFrog’s Artifactory product.
Snyk Open Source scans Docker images as well as other build artifacts. It also benefits from a proprietary database of software vulnerabilities maintained by Snyk research teams.
Sonatype’s Nexus Intelligence emphasizes reducing false positives by providing more precise analysis of build artifacts.
Veracode Software Composition Analysis builds call graphs to help you identify which open source libraries your application actually uses.
WhiteSource supports over 200 programming languages.
How to Choose a Solution
First gather some basic facts about what you need. What languages does your code base use? What tools appear in your CI/CD pipeline? Who at your company wants to see SCA reports—developers? Security? Legal? Compliance? What problems do you need it to solve? The answers will help you decide which of these features matter most:
Comprehensive Component IdentificationThe more third-party components the scanner recognizes the better. How many does it know? How often is the list updated?
Comprehensive Vulnerability IdentificationAgain, the more the better. Does the scanner rely entirely on public sources such as Mitre for vulnerability data? Some companies supplement public lists with proprietary research.
Integration CapabilitiesWhere in your software development process do you want the scan to occur? If you want the scans to run automatically, you’ll need to understand how the scanner integrates with your build system.
Speed of Developer FeedbackHow quickly can developers find out what the scanner reports? Is the scanner fast enough to run on every build? Can developers see results in their own IDE? Does the scanner generate alerts when new vulnerabilities are found in products that you have not rebuilt recently?
Suitability of ReportsReporting capabilities vary. Developers may have a hard time reading reports designed for security personnel. Some scanners do more than others to help you understand what the vulnerability is and what parts of your code rely on the vulnerable component.
Policy Enforcement FeaturesMany scanners let you define flexible, nuanced policies that will block builds containing unacceptable licenses or severe vulnerabilities.
False PositivesSCA scanners generally don’t have as many false positives as DAST scanners, but they do arise. A proof-of-concept exercise using your own code base is the best way to evaluate a scanner’s signal-to-noise ratio.
These best practices will help you succeed with any SCA solution:
Security wins when developers do it. Incorporate SCA early in your SDLC. Empower development teams to understand the risks and make responsible decisions.
Automate SCA scanning in your CI/CD pipeline. Fail builds that introduce sever vulnerabilities or prohibited licenses. When all high-severity vulnerabilities have been addressed, considering blocking the build for medium-severity vulnerabilities as well.
Consult your Legal department to determine which licenses are not acceptable for your business and set policies in your SCA solution to enforce those decisions.
Replace any component that is no longer supported by its maker. Retaining software that will never receive security updates is dangerous.
Ensure developers have a way to suppress scanner findings that investigation shows pose no actual risk in your environment. Review the set of suppressed findings periodically.
Ensure your SCA scanner updates its component and vulnerability data frequently.
Establish a screening process for adopting new third-party components. Ensure the benefit of using a component justifies whatever risk it entails. The process should consider risk indicators such as manufacturer reliability, frequency of updates, vulnerability history, and effort required to patch.
OWASP—always a great source of information for application security—has an article on Component Analysis that lists more SCA scanners and points to other resources. OWASP also sponsors two open source projects of its own aimed at managing risk from third-party components.
Software Composition Analysis is a standard, fundamental part of any secure development lifecycle. A robust CI/CD pipeline should include SCA along with SAST and DAST scanners. Finding problems early reduces costs, increases agility, makes software safer, and helps developers learn to incorporate security in their planning and design decisions.