Fuzzers Suck: New 0-Day Shows We Need To Do Better


Fuzz testing (more commonly known as "fuzzing") has become a predominate technique for bug hunting because it's easy to deploy and yields results. Academic security research is now flooded with papers on the topic — USENIX Security alone accepted 7 papers in the 2020 Fall submission cycle — many of which propose incremental improvements that'll be obsolete by next year.

Meanwhile, plenty of serious bugs are slipping through our nets. Take a look at this 0-day my team found in a piece of software from 2016, which is readily available in major Linux distros like Debian, Ubuntu, and Kali:

:::text
$ ./dmitry "%p %p %p %p %p %p"

Deepmagic Information Gathering Tool
"There be some deep magic going on"

ERROR: Unable to locate Host IP addr. for %p %p %p %p %p %p
Continuing with limited modules
HostIP:
HostName:%p %p %p %p %p %p

Gathered Inic-whois information for 0x5598e89e9b47 (nil) (nil) 0x7ffc2f4878e0 0x7f721845de80 (nil)

[...]

This is a textbook example of an externally-controlled format string vulnerability (CWE-134), in a program that's trivial to fuzz, that others have already found vulnerabilities in. How did such a simple problem go unreported?

The problem is that fuzzing assumes vulnerabilities will easily trigger crashes. Unfortunately, this 0-day demonstrates an entire class of bugs where that isn't the case. If you know what you're looking for, it's trivial:

:::text
$ ./dmitry -w %n
Deepmagic Information Gathering Tool
"There be some deep magic going on"

ERROR: Unable to locate Host IP addr. for %n
Continuing with limited modules
HostIP:
HostName:%n

Segmentation fault

But for a fuzzer using typical mutation strategies, it's actually quite difficult:

:::text
                       american fuzzy lop 2.52b (wrapper)

┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│        run time : 0 days, 0 hrs, 10 min, 32 sec        cycles done : 0      │
│   last new path : 0 days, 0 hrs, 0 min, 23 sec         total paths : 31     │
│ last uniq crash : 0 days, 0 hrs, 1 min, 12 sec        uniq crashes : 2      │
│  last uniq hang : 0 days, 0 hrs, 2 min, 56 sec          uniq hangs : 4      │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│  now processing : 3 (9.68%)             map density : 0.08% / 0.24%         │
│ paths timed out : 0 (0.00%)          count coverage : 2.13 bits/tuple       │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│  now trying : havoc                  favored paths : 13 (41.94%)            │
│ stage execs : 6540/24.6k (26.61%)     new edges on : 17 (54.84%)            │
│ total execs : 45.9k                  total crashes : 2 (2 unique)           │
│  exec speed : 585.4/sec               total tmouts : 14 (4 unique)          │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│   bit flips : 0/152, 0/148, 1/140                       levels : 3          │
│  byte flips : 0/19, 0/15, 0/7                          pending : 28         │
│ arithmetics : 5/1062, 0/75, 0/0                       pend fav : 11         │
│  known ints : 1/96, 1/420, 0/308                     own finds : 30         │
│  dictionary : 0/0, 0/0, 0/0                           imported : n/a        │
│       havoc : 18/36.6k, 0/0                          stability : 76.28%     │
│        trim : 5.00%/4, 0.00%                        ├────────────────────────┘
└─────────────────────────────────────────────────────┘          [cpu001: 38%]

Side Note: For repeatability, this is the wrapper code I hacked together to restrict AFL to fuzzing a single command line argument:

:::c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char **argv) {
    char *prog = "./dmitry";
    char *flag = "-w";
    char buf[1024];
    char *cmd[] = {prog, flag, buf, NULL};
    FILE *input;

    if (argc != 2)
        return 0;

    input = fopen(argv[1], "r");
    fgets(buf, sizeof(buf), input);
    fclose(input);

    execv(prog, cmd);

    return 0;
}

The crashes it did find pertain to CVE-2017-7938 — this program is also prone to stack overflows (oops). Also, because most fuzzers rely on stack traces to triage bugs, and these cases completely obliterate the stack, AFL can't tell that they're the same bug (double oops).

Okay, that last paragraph was a bit hostile, so allow me to take a few steps back. I'm not saying fuzzers are completely useless — I'm sure AFL could find the format string 0-day in a few hours — but I do think fuzzing is overrated. These tools rely on being very fast, at the cost of also being very dumb, and we're already seeing the diminishing returns, even as academia turns fuzzer optimization into an Olympic sport. I suspect these tools will never "become smart" because smart usually means slow, which is the antithesis of fuzzing. By calling attention to this trend (thank you for reading this far), I hope researchers will consider other avenues towards smart bug hunting, as opposed to just making fuzzing 5% faster, or adding better scaffolding to support kernels and other niche applications. Those tasks are useful in their own ways, but they leave trivial 0-days like this on the table.

To that end, I'm happy to report that the tool I've been working on for the past year, based on symbolic execution, is what actually led me to this 0-day, and it did so in a mere 16 seconds. I look forward to open sourcing the code, but I'm trying to get a paper published first and it's currently locked in a "major revision" cycle at a top tier security conference. With any luck, the release will be coming at the beginning of 2021. No one said research is fast.

Thanks for reading and happy hacking.