An Algorithm for Fast, Robust In-Network ComputationToday, companies such as eBay, Amazon, Google, and IBM routinely operate clusters with more than 10,000 servers located in a data centers around the world. Developing applications that efficiently use the resources of such a distributed cluster is a challenging task. This task is made somewhat easier by a middleware layer that provides application programmers with the illusion of dynamically updated 'global state'. Maintaining global state can be viewed as repeatedly computing an arbitrary function over a set of time-varying values, where the values are held at each node, and every node needs to know the resultant function. Many well-known problems in distributed systems, such as load balancing, leader election, barrier synchronisation, distributed file system maintenance, and coordinated intrusion detection can be cast in this form. We present and rigorously analyze an algorithm that uses a a tree of virtual nodes to compute nearly arbitrary functions of global state. Our scheme is fast: running time is <math>\Theta\(\ln n)<math>. It is efficient: Nodes exchange <math>\mathit{O}(n ln n)<math> messages. Most of these messages are within a data center and therefore are relatively cheap. It is accurate: the computed value does not have inherent errors either due to double-counting, as with standard gossip, or due to stochastic counting, as in the Flajolet-Martin approach. Finally, it fault tolerant: The algorithm fails with probability <math>\left(\frac{1}{n^{c(1+\rho)}}\right)<math> where $c$ and $rho$ are small constants. We therefore believe that our work can serve as the basis for building robust and efficient distributed systems in a variety of problem domains. For details, please refer to:
|
![]() |
