Facebook has no need for deleting data

Niall Kennedy has written an interesting post about Facebook’s data storage. They’ve written a proprietary filesystem to store photos in order to cut costs (up to now they’ve apparently been adding a $2 million NetApp storage system every week).

It turns out they’ve decided they don’t need all the features you’d find in a traditional file system (emphasis mine):

Traditional file systems are governed by the POSIX standard governing metadata and access methods for each file. These file systems are designed for access control and accountability within a shared system. An Internet storage system written once and never deleted, with access granted to the world, has little need for such overhead.

It would be nice if someone from Facebook could confirm that they do, in fact, have the ability to physically delete a photo or other items of data, and that this does, in fact, happen on the back end if you ask it to.

From what we understand of Facebook’s architecture, it probably doesn’t. When you post something, it gets copied and broadcast to your friends’ feeds; the data is out there forever. Even when you delete an account, your details aren’t fully removed. Surely, if nothing else, this is a legal minefield for the company?

2 responses to “Facebook has no need for deleting data”

  1. When dealing with lots of data rows deleting can be very resource intensive. Even on an Oracle database the slot where you delete data from doesn’t really get deleted, unless it is needed again, much like disk files.
    If files only really need to be deleted by legal request then it is much better to write a fast filesystem/database with a slow search-and-destroy delete function, rather than deleting constantly and instantly.

  2. Facebook has said photo removal results in photo index deletion but not a deletion from the backend store. There is no reference to what is in between disk marker 400 and 822, for example, but the data is still there. Their past presentations mention deletes are rare occurrence and therefore sequential writes make the most sense.

Leave a Reply

Your email address will not be published. Required fields are marked *