How ASLR Helps Enable Exploits (CVE-2013-2028)

The other day I was playing around with CVE-2013-2028 along with my peer Hong Hu when we came across something odd: CVE-2013-2028 is only exploitable on 64-bit GNU/Linux when ASLR is enabled. After confirming this observation multiple times, we were left very surprised. How could ASLR possibly worsen the security of an application? Driven by curiosity, we decided to find the root cause of this result. Ultimately, we had to go all the way to the Linux kernel code to find our answer. What we found was a kernel quirk that can't really be called a bug from the kernel's perspective, but does go against the expectations of the user. So without further ado, allow me to share how ASLR can enable the exploitation of applications.

For those unfamiliar with CVE-2013-2028, all that needs to be known is it's an exploitable vulnerability in older versions of nginx stemming from a stack buffer overflow that can be triggered by specially crafted HTTP requests. The bug occurs because an integer provided to nginx by the user that is intended to be an unsigned value is accidentally casted temporarily into a signed value. If an attacker passes a sufficiently large value, the worker thread handling the request will copy too much data from its network socket into a fixed sized buffer causing the stack to get smashed. For the curious reader, a more in-depth analysis is available here and a repository for reproducing it is available here.

So why is this bug only exploitable when ASLR is turned on? We can find the user space answer with a simple strace. If we make a chunked HTTP request and claim the total size is going to be 0xaaaaaaaaaaaaaaaa, nginx's worker will make a recvfrom() system call for 0xaaaaaaaaaaaaaab0 bytes from the network socket. When ASLR is turned on, the Linux kernel will copy our request (which is not actually 0xaaaaaaaaaaaaaaaa bytes long) into the worker's buffer, smashing the stack. However, when ASLR is turned off, the kernel will return -EFAULT and the worker will safely report the error and close the session.

We could stop here, but Hong and I were not satisfied. Why is the kernel returning -EFAULT when ASLR is disabled but not when it is enabled? The space allocated for the stack is the same in both cases, so that can't be the problem. The only obvious difference is ASLR moves the stack's address range to randomize it. When ASLR is disabled, the stack's highest address is placed at the boundary between user and kernel space, which is 0x7fffffffffff in Linux kernels compiled for x86_64. However, 0xaaaaaaaaaaaaaab0 is such a large number it shouldn't matter where the stack is placed. It's not going to fit into the memory segment and it's going to cross the boundary. So what's really happening in the kernel when it handles a recvfrom() system call?

Taking a look at Linux's implementation of recvfrom(), we see the following code:

SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size,
                unsigned int, flags, struct sockaddr __user *, addr,
                int __user *, addr_len)
    struct socket *sock;
    struct iovec iov;
    struct msghdr msg;
    struct sockaddr_storage address;
    int err, err2;
    int fput_needed;

    err = import_single_range(READ, ubuf, size, &iov, &msg.msg_iter);
    if (unlikely(err))
        return err;
    sock = sockfd_lookup_light(fd, &err, &fput_needed);
    if (!sock)
        goto out;

    msg.msg_control = NULL;
    msg.msg_controllen = 0;
    /* Save some cycles and don't copy the address if not needed */
    msg.msg_name = addr ? (struct sockaddr *)&address : NULL;
    /* We assume all kernel code knows the size of sockaddr_storage */
    msg.msg_namelen = 0;
    msg.msg_iocb = NULL;
    if (sock->file->f_flags & O_NONBLOCK)
        flags |= MSG_DONTWAIT;
    err = sock_recvmsg(sock, &msg, flags);

    if (err >= 0 && addr != NULL) {
        err2 = move_addr_to_user(&address,
                                 msg.msg_namelen, addr, addr_len);
        if (err2 < 0)
            err = err2;

    fput_light(sock->file, fput_needed);
    return err;

This code performs two relevant checks. The first occurs in:

err = import_single_range(READ, ubuf, size, &iov, &msg.msg_iter);

And the second occurs in:

err2 = move_addr_to_user(&address, msg.msg_namelen, addr, addr_len);

However, we can rule out move_addr_to_user() because it's passed the number of bytes actually fetched from the socket, which is the same in our attack regardless of ASLR. This leaves import_single_range(), which is implemented as follows:

int import_single_range(int rw, void __user *buf, size_t len,
                        struct iovec *iov, struct iov_iter *i)
    if (len > MAX_RW_COUNT)
        len = MAX_RW_COUNT;
    if (unlikely(!access_ok(!rw, buf, len)))
        return -EFAULT;

    iov->iov_base = buf;
    iov->iov_len = len;
    iov_iter_init(i, rw, iov, 1, len);
    return 0;

In this function, a sanity check is performed via access_ok() to make sure the number of bytes requested by the caller cannot cause a write that would cross into kernel space. But as we pointed out before, the value nginx's worker is passing here is 0xaaaaaaaaaaaaaab0, which should easily cross the boundary regardless of ASLR. The type size_t is defined as an unsigned 64-bit integer in our case, so access_ok() should be passed 0xaaaaaaaaaaaaaab0, right? Actually, if we look more closely, we can see the following lines enforce a limit on len:

if (len > MAX_RW_COUNT)
    len = MAX_RW_COUNT;

If we lookup MAX_RW_COUNT, we can see it equals (INT_MAX & PAGE_MASK), which turns out to be a 32-bit value. So in other words, even though recvfrom() allows 64-bit unsigned integer lengths on x86_64, import_single_range() truncates them into 32-bit unsigned integers! On a 64-bit processor, this truncation combined with ASLR's relocation of the stack allows our attack to pass the access_ok() check and smash nginx's stack.

Technically, this isn't a bug from the kernel's perspective because import_single_range() also calls iov_iter_init() with the truncated length. This means recvfrom() can only receive up to the truncated length worth of bytes from the socket and therefore passing the truncated value to access_ok() is safe.

That said, it's a really odd way of implementing this system call. From the caller's perspective, it's not made clear that even though it can pass a 64-bit length, only the lower 32-bits will be considered. Also recvfrom() treats the length as 64-bits all the way through its logic, so it's not immediately obvious that the length is being truncated by MAX_RW_COUNT. Additionally, as Hong and I discovered, there is a security consequence to this choice. Performing the access_ok() check on the truncated length allows network attacks that rely on integer overflow and underflow to succeed where they would otherwise more likely be blocked by the kernel due to a failed system call. We find this to be an interesting consequence since it results from seemingly unrelated design decisions. It is hard to recommend that the Linux kernel developers revise import_single_range() given that the real problem is a bug in nginx and not the Linux kernel itself, but we find this discovery fascinating regardless.