Is your data really gone? Explaining the challenges of data wiping.


Every now and again you hear on the news about some police investigation having a breakthrough by recovering deleted data off of an electronic device belonging to the possible suspect. You may also hear about professional criminals who recover sensitive information, such as credit card numbers, by sifting through the hard drives of old discarded computers.

How does this happen? How is it that data turns out to still be on the device even when the user consciously takes actions to delete it?

In this article, I'm going to be covering a conceptual topic which forms one of the corner stones of a field known as digital forensics. Namely, what does it mean for data on an electronic device to be deleted and what does it take to restore this supposedly destroyed information?

Terminology

The first step to grasping what it means for data to be deleted is to understand that "deletion" can have multiple technical meanings which vary from the definition we use in everyday speech. Basically, there are different extents to which data can be deleted on an electronic device and based on what extent to which we delete the data, recovering it will entail a varying level of difficulty.

Many publications already categorize the degrees of data sanitization. For example, NIST 800-88 (see Table 5.1) offers very generalized definitions of what the degrees of data sanitization are. However, since I want this article to be approachable to the laymen, I'm going to use an alternative categorization used in many publications. If you're curious about how these two categorizations relate, the categorization I'm going to be using in this article fits into the Clear and Purge categories of NIST 800-88. Namely, I'm going to cover logical sanitization, cryptographical sanitization, digital sanitization, and analog sanitization.

Logical Sanitization

Logical sanitization is the weakest form of sanitization and the easiest to explain, so I'll cover it first using an analogy:

Imagine a particular file on your electronic device as being a house. In order to visit the house, you have to know how to get there. Luckily, the streets have signs at every intersection. By following the signs, you're able to find the house.

Now imagine that I take down all the signs. The house still exists, but now if you want to visit it, you'll have to search every street. This is what it means to logically sanitize data.

You can probably already see the shortcoming of this form of sanitization. Just because I take down the signs pointing to a piece of data doesn't mean that someone can't still find that data with enough effort.

Despite this fault, this is what actually happens in your electronic device when you normally delete your file. Your device doesn't actually delete the file itself, it just deletes its pointers to that file. This means that until a new file is written into the space the old file occupied, the old file can still be recovered. And since storage these days is large and most systems write new data randomly across the storage, that old file can remain there for a very long time.

So why do electronic devices delete data this way? Frankly, because it's the fastest method. Most users are more concern about speed rather than security, so system developers design their systems to delete data in the fastest way possible.

Cryptographical Sanitization

Cryptographical sanitization is an alternative version of logical sanitization which offers a bit more protection against data recovery.

To explain the difference, consider the house analogy again, only this time, the house also has a gated wall surrounding it. Luckily, you have a piece of paper in your hand which contains the password to open the gate. Thanks to this paper, you're able to visit the house without problem.

Now if I want to prevent you from visiting the house, I don't need to take down all the street signs, I just need to destroy your piece of paper which contains the password to the gate. Destroying this piece of paper is analogous to cryptographical sanitization.

However, this too has its shortcomings. For example, what if you memorized the password or otherwise made a copy of the paper? Alternatively, what if the password is so short that you could simply guess it? These are two serious challenges for cryptographical sanitization.

Additionally, cryptographical sanitization is weakened by the fact that it is circular. For example, what if I destroyed the paper containing the password by logically sanitizing it? You could just recover the paper, as mentioned in the previous section, and then access the house. In other words, cryptographical sanitization is only as strong as the sanitization applied to the password. If we apply cryptographical sanitization to that as well, then as the philosophers would say, it's turtles all the way down.

Digital Sanitization

Now we reach the stronger techniques for sanitizing data. Both of the two remaining techniques resort to destroying the house, but differentiating between the two can be difficult for the non-technical reader. For this reason, I'm going to keep my explanation brief and stick to the house analogy I've been using up to this point, even though doing so will introduce some vagueness. If you're interested in really understanding the difference between digital and analog sanitization, I plan to write a later article dedicated to this distinction using a different analogy better suited to the task.

Continuing along with the house analogy, digital sanitization is comparable to me taking a bulldozer, leveling the house, and then throwing the pieces into a dumpster and taking that dumpster with me. Now you cannot visit the house because the house simply doesn't exist.

Or can you?

As it turns out, there is still information about the house left behind! For example, you might study the depression in the ground left behind after the house was removed. Based on its size and depth, you might be able to approximate the house's dimensions; even though the house no longer exists. In fact, and this is where the house analogy breaks down, if you know enough about the construction of houses similar to the one that was destroyed, it is actually possible to reconstruct a perfect copy of the original house!

Analog Sanitization

Finally, at least in the scope of this article, we get to analog sanitization. As you've probably anticipated by now, this is the strongest of the four sanitization techniques covered in this article. Using the house analogy, this would be comparable to me not only destroying the house, but then also digging up the dirt on the property and replacing it with new dirt and leveling it. Now there are no remaining indicators that a house ever existed at any time in that spot, so there is nothing left to be used to try to reconstruct a replica house from. The data is truly gone at this point, so long as I have indeed destroyed every trace of the original house and its impact it had on its environment. That's a pretty big conditional claim I just made, but I'll leave it at that.

Afterword

This article ended up becoming more conceptual than I originally intended, but I hope the analogy was able to make the concepts I covered approachable. As mentioned, I hope to at some point write a follow-up article using another analogy which can better explain the difference between digital and analog sanitization. I also hope to in the future write an article to serve as the practical counterpart to this article for those who would like to know more about how to securely delete the data from their electronic devices.