Finding a needle in Haystack: Facebook’s photo storage

Users upload a billion photos (about 60 terabytes) each week and Facebook serves over one million images per second at peak and Haystack is the system that is behind all this. Haystack is a system where data is written once, read frequently, never modified and rarely deleted.
The issues with the original NFS based design are discussed in detail. In summary, the issue seemed to with the number of disk operations required to read a single photo. They seemed to have optimized it from over 10 to 3 and still found it to be limiting.

Although it doesn’t generalize, the discussion on when to decide on building a custom storage solution is insightful. The goal seems to be to achieve a system which would yield a better price/performance ratio over the existing solution.

Some of the main goals Haystack strives for are :

High throughput and low latency : Haystack tries to achieve high throughput and low latency by designing for at most one disk operation per read. Haystack stores multiple photos in a single file and also tries to reduce the memory used for filesystem metadata to keep it in main memory. Its main storage component called the Store, uses XFS because of the advantages it presents. XFS eliminates disk operations for retrieving metadata when reading a photo and also the block allocation map for large files can be held in memory.

Fault-tolerant : The paper says that Haystack replicates each photo in geographically distinct locations. Also, its solution for failure detection uses a background task for periodically inspecting Store machines that have problems. The paper could have added more details on this aspect of the system.

Cost-effective : The discussion on this aspect is good. Savings are quantified along two dimensions: Haystack’s cost per terabyte of usable storage and Haystack’s read rate normalized for each terabyte of usable storage. In Haystack, each usable terabyte costs 28% less and processes 4x more reads per second than an equivalent terabyte on a NAS appliance.

Abstract: This paper describes Haystack, an object storage system optimized for Facebook’s Photos application. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data. Users upload one billion new photos (60 terabytes) each week and Facebook serves over one million images per second at peak. Haystack provides a less expensive and higher performing solution than our previous approach, which leveraged network attached storage appliances over NFS. Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups. We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory. This choice conserves disk operations for reading actual data and thus increases overall throughput.

Previewing from http://www.usenix.org/event/osdi10/tech/full_papers/Beaver.pdf