Paper (USENIX ATC '13, Nathan B. )

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/5a01ca7b-2526-4152-8bc3-be0421a67289/Untitled.png

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/cb61138e-9eb5-40fe-925e-e58bc076e8de/Untitled.png

Intro

Facebook has more than a billion active users who record their relationships, share their interests, upload text, images and video, and curate semantic information about their data.

TAO: A read-optimized graph data store

TAO can sustain a billion reads per second on a changing data set of many petabytes.

Background

Facebook was originally built by storing the social graph in MySQL, querting it from PHP and caching results in memcache

Motication

Motivated by encapsulation failures in the PHP API, by the opportunity to access the graph easily from non-PHP services, and by several fundamental problems with the lookaside cache architecture

What are some problems with previous Memcached implementation?

  1. Inefficient edge lists. A key-value cache is not a good semantic fir for lists of edges.
  2. Distributed control logic. The control logic is run on clients that don't communicate with each other. This makes it difficult to avoid thundering herds. They want to move the control logic into the cache itself.
  3. Expensive read-after-write consistency: Facebook memcache uses remote markers to track keys that are known to be stale, following reads for those keys to the master region

TAO Data Model and API

Objects and associatations

objects are typed nodes, associations are typed directed edges between objects

Both objects and associations may contain data as key → value pairs

Objects: (id) -> (otype, (key -> value)*)
Assoc: (id1, atype, id2) -> (time, (key -> value)*)