Introduction

While learning the basics of digital forensics, I often became frustrated about the lack of support for common tools on various Unices, specifically Linux. One of the most powerful feature of Unix-like operating systems is the ability to chain arbitrary commands, such that the output of one command becomes the input of the next. This allows an investigator to process arbitrary output, a feature that is especially useful when confronted with seemingly endless walls of text, a format often used for the output generated by digital forensics tools.

While many of these tools, especially open source programs such as The Sleuth Kit, and its enhanced graphical frontend, Autopsy, should be able to run on macOS and various Linux distributions, the official documentation is often severely lacking. As another example of this unfortunate trend, the excellent RegRipper2.8, while written in Perl, is difficult to install on Linux.

Highlighting these examples is not in any way meant to impugn the dedication or contribution of volunteer contributors to either RegRipper or Autopsy, but rather to highlight a difficult problem with digital forensics in an increasingly cross-platform environment. As more computing is happening on the cloud, and more data is being stored or at least copied into a vast network of off-site storage devices, we need solutions that facilitate quick deployment of various tools without having to worry about a traditional bespoke installation process, and that can run on various host environments. While many of these tools are developed and tested largely on Windows hosts, that is likely indicative of resource limitations that the developers of these tools face. For those of us who have benefited from their hard work, but have a distinct use case, it then becomes our responsibility to contribute our modifications and discoveries back to the community where we can.

For this exercise, I selected four commonly used examples of digital forensics software that I have come to rely upon for fundamental aspects of a forensics investigation, including creating an image, parsing various Windows Registry Hives, performing memory forensics, and examining an image for various artifacts of interest. Three of these programs are open source, and one of them, namely FTK Imager (command line version), is commercially produced by AccessData, and is freely available for download on their website.

The following is a table of the chosen tools which I have gotten to successfully work in a Linux environment:

Name Version License Use
FTK Imager 3.1.1 Commercial Create bit-stream images of devices
Autopsy 4.10.0 Apache 2.0 Graphical frontend to The Sleuth Kit forensic suite
Volatility 2.6.1 GPL 2.0 Memory forensics
RegRipper2.8 21090128 MIT Parse Windows Registry Hives
Wireshark 3.0.1 GPL 2.0 GUI Network Analyzer

Initially, I started this project by attempting to figure out how to install these various applications on my host machine, a Frankenstein’s Linux, which has been hacked together for the various needs of a Cybersecurity graduate student. While I am sure that it would be possible to configure all of these applications to run natively on my host, I wanted to create something that would not be fragile and dependent on a particular operating system state, which will change based on the requirements of various assignments. I also wanted to create something deterministic and also useful for others.

Containers and Docker

This is where the concept of containerization becomes useful. Containers are a type of virtualization technology, which paradigmatically differ from a traditional virtual machine (VM) in several key areas. Typically, containers provide the experience of an isolated filesystem, but share the kernel with the host operating system (OS). The most well-known container implementation is Docker, which leverages Linux kernel features to provide an abstraction model for running arbitrary applications inside an isolated virtual environment. To use Docker on a Linux machine, the host kernel must be compiled with a specific feature set, which like the Linux kernel in general, is broken into modules, which allow for certain specific functionality to be available. An example script is provided by the Docker installation on my host, which can be run to check for required kernel configuration.

A truncated example of the output of running the /usr/share/docker/contrib/check-config.sh script, showing some of the core kernel features is below:

$ bash /usr/share/docker/contrib/check-config.sh
warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-custom_kernel_01 ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
...
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
...

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
...
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
...
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
- Storage Drivers:
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)

From this truncated output, it is possible to get an idea of the types of features that the container daemon (dockerd) relies on in the host kernel, so that it can run arbitrary applications inside of the various containers. OS kernels provide an interface to the underlying hardware, as well as access to other low-level functionality, such as filesystems and the Linux network stack. From the above output, it is possible to see that the EXT4 filesystem is configured with support for POSIX Access Control Lists (ACLs). Additionally, the host kernel is also configured to support namespaces and cgroups, which are crucial to the way that Docker isolates processes and filesystems from the host. Finally, CONFIG_OVERLAY_FS is enabled, which provides a modern union filesystem that allows the container filesystem to be isolated from the host through temporary mount points provided by the kernel, while the container is running. This means that by default, a running container does not litter the host filesystem with the artifacts of logging, installation, or even data contained within. However, it is possible to expose certain host mountpoints to the container to facilitate the exchange of data with the host OS. To clarify things a bit, here is a brief overview of the underlying terminology:

  • Docker: Docker is a “container engine”. Docker is the underlying application, which provides a container runtime and abstracts access to the various Linux kernel features, which are required for container functionality.
  • image: An image is a stateless “ordered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime.”
  • container: A container is an instance of an image. While images are stateless, containers are stateful, but also ephemeral by default. This means that while a container is “running,” it is possible to make changes to it, often using familiar tools provided by the underlying image. For example, if running an Ubuntu or debian container, it is possible to add or remove packages using the apt package manager. However, when the container is shut down, none of those modifications will persist by default, as the container’s initial state is dependent on an image.
  • dockerd: This is the actual daemon runtime command, available as a package in many different Linux distributions. To launch a container from an image, it is necessary for dockerd to be running, as it provides the interface to the host’s running kernel.

Forensic applications as a service

There are several oft-cited benefits of deploying applications as containers. Typically, Docker images are built to replace traditional monolithic server infrastructure through the use of microservices. This means that instead of a dedicated computer or virtual machine running an OS with various applications installed, such as a webserver, database engine, etc., each of these discrete roles is packaged individually as an image that is launched as a container that communicates with the other containers. This allows each container image to be tested and deployed independently because their dependencies are loosely coupled. Consider the example of running a typical LAMP Stack: the version of Apache is tightly coupled to the version of php and to the versions of MySQL/MariaDB. The availability of different versions of these applications is dependent on the repositories of whatever flavor of Linux is running underneath. In this scenario, upgrading one component is likely to require updates of others, as well. This can have security implications, as if a vulnerability is discovered in one part of the stack, other components may need to be upgraded to support the change in a functioning environment. Containers, a part of a microservice architecture, provide a way to reduce the maintenance burden of some of these issues. It is possible to isolate the database backend from the webserver, therefore avoiding the problem of shared libraries and their associated dependency issues.

Another benefit of containers is reproducibility. This makes Docker a convenient tool for developers who want to collaborate on a shared code base without having to worry about the idiosyncrasies of their individual environments. Instead of spending time configuring a fleet of heterogeneous developer workstations to have the necessary software versions and dependency trees, all members of a team can run an identical development environment by pulling a preconfigured image. There are other tools that solve this problem, such as Vagrant, but they often rely on provisioning fully-fledged virtual machines. One of the best aspects of using containers, however, is their disposability: an image can be defined through the use of a simple Dockerfile, a sort of image recipe with shell syntax. This recipe is then used by the Docker daemon to build the image to the defined specification, containing definitions of exposed network ports, installed packages, and the user running the process.

Below is an example of a simple Dockerfile used to build the pwndbg gdb wrapper, using the official Docker Ubuntu Cosmic image as a starting point with the FROM directive:

FROM ubuntu:cosmic
LABEL \
    maintainer="djds djds@ccs.neu.edu" \
    description="pwndbg wrapper for gdb in Docker"

RUN apt-get update && apt-get dist-upgrade -y && apt-get install -y \
    git locales && locale-gen "en_US.UTF-8"

ENV DEBIAN_FRONTEND="noninteractive"
ENV LANGUAGE="en_US:en"
ENV LC_ALL="en_US.UTF-8"
ENV LANG="en_US.UTF-8"
ENV PYTHONIOENCODING="UTF-8"

RUN git clone https://github.com/pwndbg/pwndbg \
    && cd pwndbg \
    && ./setup.sh

RUN rm -rf /var/lib/apt/lists/*

WORKDIR /data

ENTRYPOINT ["/usr/bin/gdb"]

This file works by defining the commands that will be run to build the specified Docker image, which will be the base state of any instance of that image, a container. If, for example, I wish to debug a certain binary using an instance of this image, I can specify a location of the binary and mount it inside the container. The container will not have access to any other part of the filesystem on the host, but will be able to see the contents of the defined mount point. It is even possible to mount an arbitrary directory or file such that it is read-only inside the container, which is useful for both malware analysis, and forensics applications, where it is crucial to take precautions not to alter the data that is being analyzed.

The image can be built with the following incantation:

$ docker build -t "${REGISTRY}/${user}/${repo}:${tag}" .

An example of running an container named pwndbg, as an instance of this image is below:

#!/bin/bash

docker run --rm -it \
    --cap-add=SYS_PTRACE \
    --security-opt seccomp=unconfined \
    -v "/tmp/src:/src:ro" \
    -v "$(pwd):/data" \
    --name="pwndbg" \
    "${REGISTRY}/${user}/${repo}:${tag}" "${@}"

By passing the -v "/tmp/src:/src:ro" option to the docker run command, the directory /tmp/src will be mounted inside the container as a read-only directory at /src. The line -v "$(pwd):/src:ro" will mount the current working directory at /data inside the container, which provides a mapped volume to share data generated by the container with the host. Using the volume mapping option is completely arbitrary, but this example provides an idea about the type of isolation that is possible with single-application containers.

Graphical applications Dockerized

While Docker is often run in a headless environment, it is also possible to run arbitrary graphical apps inside a container and view the display on the host by passing environment variables such as ${DISPLAY} and the host’s /tmp/.X11-unix UNIX domain socket through to the guest. This has the advantage of avoiding slow rendering methods, often used with virtual machines, such as the VNC remote framebuffer or SSH X11 forwarding. The best introduction to this relatively new way of running desktop applications is provided by Jessie Frazelle in her 2015 blog post, Docker Containers on the Desktop.

A benefit of this way of doing things is that opinionated applications can be contained without running amok all over the host. Particular dependency sets can be contained to their own namespace, which is especially useful for applications that require an older version of a language or runtime, as in the case with Autopsy’s hard dependency on Oracle Java SE 8. Another benefit is that once a working Dockerfile is written, deploying that application on a given host is as simple as building the image from the Dockerfile, or alternatively, pulling the image from a registry, which functions as an image repository, and can be configured as part of a continuous integration pipeline. Once this is set up, any changes to the image recipe, such as a version bump, will be automatically built and tested against a predefined set of rules, which is one way to automate a portion of application testing for deployment.

As a result of my frustration with the current state of affairs in packaging standard forensic applications for Linux, I decided to containerize some of the applications that I use most frequently. These include FTK Imager (CLI), Autopsy, Volatility, RegRipper2.8, and Wireshark. The following sections outline how to build and run these tools using the Docker Engine. There is also a git repository with all of these build recipes located at gitlab.com/bghost/docker-forensics. Finally, there is a container registry that is updated, using continuous integration with each push to the git repo located at gitlab.com/bghost/docker-forensics/container_registry.