Most Docker images build on full Linux distributions containing about a hundred MB of unnecessary complexity, adversely affecting the time and expense of data transfer, compliance scope, and security profile. This overhead may seem like a necessary evil, but by using multi-stage builds and Google’s “distroless” base images we can build runtime images that are the next best thing to FROM scratch
.
Small is beautiful
There’s a certain elegance to minimal Docker images. A kind of Zen simplicity that goes beyond just code golf for containers. But beyond being beautiful (to me), the effort of keeping your images as small as possible has practical implications as well:
- Decreased attack surface. Out-of-the-box, containers provide a fair bit of security just by isolating applications both from their host and from one another. With a little effort they can also make it easier to follow the principle of least privilege, making an application more secure by restricting its capabilities to the bare minimum it needs to do its job.
- Less data to move around. Each time you pull an image, you’re incurring an expense in the form of time and data transfer costs. It may not be that big of a deal if you’re on your personal workstation, but if we’re talking about a 50-node Kubernetes cluster things can get pretty expensive.
- Less to keep track of. Everything in your image is another thing whose provenance needs to be tracked and which needs to be scanned for known vulnerabilities, pointlessly adding to management overhead and build-scan times.
How do you get there from here?
In the real world, though, minimal images aren’t all that common. Walking away from the convenience of installing your app into a full distribution with all the power of a package manager isn’t trivial, and most images are bloated with all kind of unnecessary things that their creators may not even be aware are there.
The obvious answer is multi-stage builds, in which the first-stage build produces an executable artifact, which is added to the runtime image in the second-stage build. Sure, this would work great for dropping statically-linked binaries written in Go or C++ into scratch
images, but what about languages like Python or Java that need relatively large language runtimes?
What about Alpine, which is essentially Busybox with a package manager? At 2MB compressed, it’s certainly small enough, and you can use the package manager to install a JRE, for example, but once you do will you ever need that package manager again? Probably not, and then it just sits there, being a vestigial security risk. What’s more, Alpine includes the musl system library implementation, potentially trading size for compatibility, particularly around DNS.
This is where Google’s “distroless” images come in. Rather than taking the top-down approach of trying to produce the smallest possible distro, the distroless images are built by adding only the dependencies and language runtimes absolutely necessary to run your application. In the words of Google’s Matt Moore, “this is FROM scratch for the rest of us”.
The means used to construct these images is pretty fascinating in its own right, taking advantage of some of the finer features of Bazel. A worthy treatment is beyond the scope of this article, but more information is available in the repo and in Matt Moore’s talk on the subject.
Multi-stage “distroless” builds
Multi-stage Docker builds were introduced in version 17.05 essentially to do exactly what we’re trying to do here: separate artifact construction from its deployment. The idea is simple and powerful: one or more “build environment” builds are run, using images containing the necessary build machinery to generate the application artifacts. The artifacts are then added into a completely separate “deployment environment” image, the “distroless” images in our case.
Bring on the code!
At the time of this writing, “distroless” images are available for Python, NodeJS, Java, CC, and .NET. See the GoogleCloudPlatform/distroless repo for more information.
The following examples were ripped off borrowed from the repo. All credit goes to the original authors.
Example 1: Multistage Java
In the first example, we’re simply using javac
from the openjdk:8-jdk-slim
image to generate a JAR file, main.jar
, which we then COPY
into the distroless image. We could just as easily have used Maven or any other build tool of your choice as well.
FROM openjdk:8-jdk-slim AS build-env
ADD . /app/examples
WORKDIR /app
RUN javac examples/*.java
RUN jar cfe main.jar examples.HelloJava examples/*.class
FROM gcr.io/distroless/java
COPY --from=build-env /app /app
WORKDIR /app
CMD ["main.jar"]
Example 2: Multistage .NET
This example follows the same general theme as the Java example above in which a binary is constructed and copied into the distroless container.
FROM microsoft/dotnet:2.0.0-sdk AS build-env
ADD . /app
WORKDIR /app
RUN dotnet restore -r linux-x64
RUN dotnet publish -c Release -r linux-x64
FROM gcr.io/distroless/dotnet
WORKDIR /app
COPY --from=build-env /app /app/
CMD ["bin/Release/netcoreapp2.0/linux-x64/publish/hello"]
Last modified on 2018-02-27