- Wed 28 October 2015
- Theory
- #theory, #filesystems, #crypto

As I promised in my previous blog post, I will try to explain the difference between digital and analog sanitization using an analogy better suited for the task. If you have no clue what I'm talking about, I recommend that you go and read that post. If you were hoping for the other follow-up I promised regarding a more practical guide to data protection, this is not that post. This is going to be another conceptual writing.

## The Analogy

The analogy which I'm going to use to try to illustrate the difference between digital and analog sanitization might seem somewhat contrived, but it's the simplest one I could come up with so here it goes.

Imagine you have a collection of buckets which can each hold 4 cups of water. You are able to add and remove water from the buckets whenever you like, however, you must always try to add or remove 3 cups worth of water at a time. So if a bucket is empty and you add water to it, the bucket ends up with 3 cups worth of water in it. If you try to add water again to the same bucket, some water overflows and is lost so ultimately the bucket ends up with 4 cups worth of water in it. Similarly, if the bucket only has 1 cup of water in it and you try to remove water, you end up removing all the remaining water in the bucket.

Now here's the last piece regarding how this analogy works. Assume that we only care about answering the question of if a particular bucket is more full or more empty. Therefore, if the bucket contains more than 2 cups of water, we will call it mostly full. Likewise, if it has less than 2 cups of water, we will consider it mostly empty. We don't have to worry about any bucket having exactly 2 cups of water in it given how this model is works. In fact, it is only possible for any particular bucket to contain 0, 1, 3, or 4 cups of water; assuming all the buckets start empty.

## Emptying the Buckets

So we have our buckets and some time has gone by and now all of our buckets contain different amounts of water. Some are mostly full while others are mostly empty. Now lets assume that we want to "wipe" all of our buckets. In other words, we want all our buckets to be mostly empty. The process is pretty easy, we can just go to every bucket and remove water from it. Simple, right?

While it is true that doing this will now cause all of our buckets to be mostly empty, take another look at how much water is actually in each bucket. The buckets which originally had 3 or less cups of water will now be empty, but the buckets which originally had 4 cups of water will still have 1 cup of water remaining. This means that if we inspect how much water is in each bucket, we can determine which ones previously held 4 cups of water. We can reconstruct a portion of the past state of the buckets from the current state of the buckets!

## What Went Wrong?

In a nutshell, the reason why we were able to figure out the past state of some of the buckets is because there is a correlation between the last state the bucket was in and its current state. This correlation occurred because when we emptied the buckets, we only consider the 2 possible digital states for a particular bucket and ignored the 4 possible analog states. If we had considered the analog states, we would have realized that we needed to empty each bucket twice in order to make it impossible to determine any bucket's previous state. This is an analogy of the difference between digital and analog sanitization.

## Back to Reality

Although our simple analogy used buckets of water, this concept actually applies to electronic storage devices. For those of us with experience in the logical side of computing, such as any computer scientist who might happen to read this, we have a tendency to abstract away the gritty details about how the underlining hardware of a computer works. While utilizing this layer of abstraction simplifies our problems and makes them approachable, abstractions such as these can cause us to forget that our idealistic binary zeros and ones are actually being stored in physical materials. As every electrical engineer knows, these physical materials don't behave in accordance to our perfectly abstracted models. Instead, we have to take their infinite possible analog states and group them into the "zero" set and the "one" set in order to fit them to our models. Never forget, however, that at the end of the day the storage device is indeed physical and consequently analog. Correlations between states can exist, but become hidden underneath the veil of our abstractions.

So to end on a slightly more piratical note, how do professionals deal with these analog correlations? What is the takeaway from all this? The short answer is that if you want to wipe an electronic storage device, but you can't simply destroy it, don't settle with "zeroing" out the data over 1 pass. Overwrite all the data with random values and do so multiple times. The more passes and the more entropy, the less likely it is that a correlation will remain between the original data and the current state of the device. How many passes is necessary depends on the particulars of the device, so if you need to wipe a computer, smart-phone, or other electronic device, I recommend doing some research and selecting a professional tool to do the work for you. DBAN, for example, is a good one to consider.