Differentiating ARCUS (USENIX'21) and Bunkerbuster (CCS'21)


I've received several questions recently about two papers I published in 2021, one at USENIX (which I'll refer to here as ARCUS) and another at CCS (which I'll refer to as Bunkerbuster). You can find these papers at the USENIX and CCS conference websites, respectively.

The question people have is what makes these two publications unique from each other? In short, how do I justify ARCUS and Bunkerbuster being two separate publications? This question is motivated by both papers being published in the same year, with aesthetically similar architecture diagrams, and my decision to ultimately merge the codebases together into a single repository. Did I pull the wool over the eyes of the reviewers (and my dissertation defense committee)? Were my advisors asleep at the wheel? These are legitimate questions to ask, so let me address them here.

For busy readers who want to get straight down to the technical brass tacks, here is the commit where ARCUS and Bunkerbuster were merged so the research community can interrogate the roughly 5,600 additions that Bunkerbuster contributed to the prior ARCUS codebase. To summarize the difference in 1 sentence: ARCUS (the first prototype) is a reactive system that requires traces containing exploits to be cherry-picked by preexisting defenses, whereas Bunkerbuster (the second prototype) is a proactive bug hunting system that can sift through 10 times more data, that does not contain exploits, to find and help fix bugs, hopefully before attackers can find and exploit them in-the-wild.

For those who wish to read on, I'll breakdown my high-level explanation into 3 sections: the story of how ARCUS was created, how the findings and unsolved problems in ARCUS then inspired the creation of Bunkerbuster, and the timeline and hidden logistics surrounding their publications.

ARCUS' Story

At its essence, ARCUS (the earlier of the two prototypes) tackles a simpler problem than Bunkerbuster. Specifically, in the ARCUS work, my collaborators and I wrestle with how to diagnose alerts generated by host-based defenses like intrusion detection systems or control flow integrity monitors. ARCUS is a purely reactive system that relies on other defenses to accurately feed it execution traces (captured using Intel Processor Trace) that contain low-level binary exploits. In short, ARCUS has no exploration capabilities and its performance is directly dependent on the false negative rate of whichever preexisting defenses ARCUS is integrated with. If an exploit goes undetected, ARCUS offers no assistance.

This is a strong prerequisite, but it was necessary in order to simplify the research problem enough to focus on the core technical contribution of inventing and demonstrating binary symbolic root cause analysis. To be clear, no research occurs in a vacuum. The idea of binary symbolic analysis builds on a wealth of prior knowledge, such as the Mayhem work published in 2012. However, accurately following real-world Intel PT recordings symbolically, defining and implementing reliable detection techniques for multiple classes of memory safety bug (buffer/integer overflow, use-after-free, double free, format string), and then figuring out how to leverage the recovered constraints to classify the root cause and propose a preliminary patch, convinced my peers and reviewers that ARCUS was a publication-worthy work, despite its narrow scope.

One other curious finding came out of the ARCUS work: it discovered 4 new vulnerabilities during the evaluation, even though it was analyzing traces containing known exploits. This caught us by surprise and became the kindling for a follow-up work: Bunkerbuster.

Bunkerbuster's Story

The goal of Bunkerbuster was to address all the shortcomings left behind by ARCUS. Gone is the prerequisite of there being preexisting defenses on the monitored system. In fact, we even removed the assumption that only 1 system is being monitored. Instead, we aimed to make Bunkerbuster a proactive, end-to-end bug hunting system, driven by benign recordings of production and end-user workloads to find and fix bugs before an attacker could even attempt exploitation in-the-wild. ARCUS was handed the needle (exploit) by a deus ex machina (preexisting defenses). Bunkerbuster would have to search the haystack.

Flowery analogies aside, what does that really mean at a technical level? First, in Bunkerbuster, we could no longer tiptoe around exploration. Unless Bunkerbuster could branch out into states not reached by the input traces, the chances of finding bugs from recordings of benign executions would be significantly reduced. To accomplish this, we created a new plugin class for guiding state exploration (while controlling state explosion) and designed and implemented search heuristics based on our domain expertise of memory safety bugs.

Second, Bunkerbuster had to sift through significantly more Intel PT data than ARCUS ever did because it was looking at benign executions rather than malicious ones cherry-picked by a host-based defense. In empirical terms, the dataset Bunkerbuster was evaluated on is 10 times larger than the ARCUS dataset, creating a noisy digital haystack. Bunkerbuster overcomes this challenge that would cripple ARCUS using an on-the-fly hashing and snapshotting technique that we describe in the CCS paper.

Timeline and Publication Logistics

The time to develop ARCUS (minus evaluation and paper writing) was about 1.5 years. Developing Bunkerbuster took an additional 1 year. So why did the papers appear to be published so close together? The answer is conference review cycles.

ARCUS was originally submitted to USENIX 2020, and then was resubmitted to USENIX 2021 as a major revision. This significantly inflated the time to publication, even after all the technical work on ARCUS was completed.

Conversely, because my collaborators and I had already gone through the hardships of publishing ARCUS, when we wrote the Bunkerbuster paper, we were able to avoid many similar pitfalls and address likely reviewer critiques upfront. The result was Bunkerbuster was accepted to CCS 2021 without major revision, which significantly reduced its time to publication compared to ARCUS.

In other words, when I gave my talk at USENIX 2021 on ARCUS, I was describing a system that had been finished over 1 year prior. For what it's worth, I'm currently involved in 2 papers that were submitted to USENIX 2022, but ended up in major revisions that were resubmitted to USENIX 2023. Should the reviewers accept these revisions, by the time my collaborators and I give our conference talks, we will be describing systems we built when I was still a PhD student, except I will now be 1 full year into my assistant professorship at The Ohio State University. I point this out not to raise a fuss about USENIX's review process, but just to emphasize that ARCUS' slow publication timeline is not a one-time anomaly. It's just the reality of large-scale peer review.

Conclusion

Hopefully this clarifies the question surrounding the publication of ARCUS and Bunkerbuster in 2021. I'm happy to answer additional questions that the community may have, and lastly I'd like to thank the readers who made it to the end of this text.

Happy hacking!