StackHawk has recently released support for arm64 packaged executables and Docker images for the StackHawk scanning engine. This is now standard as part of our scanner pipeline and enables us to support future computer architectures as part of our software build process. Building up our capacity to cross-compile our CLI and Docker images for multiple computer architectures as part of our regular continuous integration demonstrates how seriously we take our automation science.
This blog post describes how we went about supporting builds of our scanner product for the arm64 architecture and outlines the details we had to concern ourselves with and address. Adding this support has allowed our users to run our scanner product on the arm64 architecture. We hope you find this guide useful in implementing multi-architecture builds of your software.
What is the difference between amd64 and arm64 architectures?
Software compiled into executable binaries are built to run for a given architecture. The software architecture describes the instruction set of operations the CPU will perform when executing the machine code. Code compiled for the arm64 architecture could not run on an Intel-based CPU, for example. The ARM standard for software architecture became popular with mobile devices, and the armv8 standard is arm64, which has a 64-bit word length, but can be run in an AArch32 state, which is effectively compatible with running armv7 compiled software. This blurs the lines between applications designed for mobile operating systems and system servers, and is proving to be advantageous for running high performance software at scale.
The many names of arm64
Arm64 and AArch64 are similar names for the same architecture. Docker architecture names were not standardized until after variations of all architectures had emerged. These get normalized into the specific known architectures Docker respects. AArch64, linux/arm64 and armv8 are all acceptable names and designate the container as built for arm64 architecture. More information on the architectures supported by Docker can be found here.
Building the StackHawk CLI
The StackHawk CLI was developed in Kotlin with Clikt and uses a custom fork of the ZAP core project. HawkScan uses Gradle as a build tool and separates its build process into multiple modules. These modules package themselves into a JAR that is run from a shell script. That executes the compiled JAR with the Java virtual machine installed on the host machine. This is where the Java axiom “write once, run anywhere” comes true; running HawkScan on arm64 worked on day one with our CLI because it is a Java bytecode executable, and the Java 17 executable (and a Microsoft backported version of Java 11) support running on the arm64 architecture.
While our CLI worked seamlessly on the arm64 architecture, more effort was required to get our HawkScan Docker image and platform services working on different architectures. Much of our work intimately involved Docker as a dependency and ensuring that Docker images could be built to be compatible with the arm64 architecture.
Building multi-architecture software with Docker
For those that may not be familiar, Docker is a wonderful collection of software tools, applications and cloud platform services to enable the development of software in containers. Containers run from defined software images, which are housed in container registries.
On OSX and Windows, Docker Desktop is a GUI application, with additional capabilities beyond what the Docker API or CLI typically provide. These utilities help provide a rich experience when developing on these platforms, but creates a disconnect when attempting to automate the building of software in Linux or CI environments.
Docker buildx is a plugin that extends the Docker command to use the BuildKit toolkit and allows for building multi-platform images with Docker Desktop. This is done using the --platform flag to specify the architecture that an image should be built for. Docker Desktop provides build roles for compiling into multiple architectures, but the Linux Docker engine does not have these same capabilities by default.
In order to build for multiple architectures in a Linux environment in CI, you must first start the cross-platform emulator with specific Docker images. binfmt_misc with QEMU emulator support allows for building images for multiple architectures. Docker Desktop includes this emulation support by default, but building outside of the desktop version of Docker will require using the tonistiigi/binfmt image. Additionally, when communicating with the Docker API, the environment variable DOCKER_BUILDKIT=1 must be specified as a property in the API request. This would normally be set automatically by Docker Desktop. More information on building images with BuildKit can be found here.
Registries vs. repositories and images vs. manifests
A registry is a directory of listed container repositories and serves as the location of stored images. A registry may be either local to an individual user’s computer or public in a location such as DockerHub. Google Registry, Amazon ECR and Dockerhub are registries. A repository is a specific location in a registry and exists as a collection of images with associated tags to identify different builds or container contents.
A Docker image is the container built for a specific architecture. Registries can also host manifests, which are associations of common images and the specific architectures they are built for. They allow anyone to pull the manifest as if it were an image and to receive the correct image based on their client architecture.
Using Gradle tasks to build multi-architecture images in CICD
In order to automate the build of our HawkScan Docker image, we used Gradle Tasks to trigger the creation of multi-architecture Docker images in our CICD pipeline. Once the task is triggered, images are built for amd64 and arm64architectures, the images are tagged locally and pushed to Amazon ECR, and then the manifest is generated. The Java API client for Docker was extremely helpful in building out these tasks. We extended this functionality to our platform services, which we can now develop on arm64 machines. This enables us to easily add new architectures to run our software on in the future, and gives our developers the tools to build StackHawk on Apple’s M1 silicon.
Significant research went into learning how to build arm64-compatible images of HawkScan and our platform services. Learning the nuances of how Docker handles multi-architecture builds in different contexts proved instrumental in developing images with arm64 compatibility. Undergoing this process has yielded great benefits as we are ready to build our products and services on existing architectures and on new architectures to come. We hope that sharing these findings will enable you to more smoothly move in the direction of building software compatible with multiple platforms.