TwitterRSS
The Limitation of MapReduce: A Probing Case and a Lightweight Solution

The Limitation of Ma...

While we usually see enough papers that deal with the applications of the Map Reduce programming model this one for a change tries to address the limitations of the MR model. It argues that MR only allows a program to ...

continue reading
Keyword Searching and Browsing in Databases using BANKS

Keyword Searching an...

BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. ...

continue reading
HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

HadoopDB: Efficient...

The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address ...

continue reading
Spark: Cluster Computing with Working Sets

Spark: Cluster Compu...

One of the aspects you can’t miss even as you just begin reading this paper is the strong scent of functional programming that the design of Spark bears. The use of FP idioms is quite widespread across the architecture of ...

continue reading
Kafka: a Distributed Messaging System for Log Processing

Kafka: a Distributed...

Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications. Why would a traditional messaging system not be a good fit for log processing? Typical enterprise ...

continue reading
Windows Azure Storage : A Highly Available   Cloud Storage Service with Strong Consistency

Windows Azure Storag...

Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as ...

continue reading
Thialfi: A Client Notification Service for Internet-Scale Applications

Thialfi: A Client No...

The Scandinavian mythology regards Thialfi, a swift runner, as the attendent of Thor, the god of war. Motivated by the swiftness that qualifies Thialfi, was perhaps why the folks at Google named their message delivery system (it delivers notifications at ...

continue reading
Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming

Spotify: Large Scale...

This paper from folks at Spotify primarily focuses on how they use P2P techniques in their platform. The service is not web-based, but instead uses a proprietary client and protocol. At the heart of the system is this custom music streaming ...

continue reading
Tenzing : A SQL Implementation On The MapReduce Framework

Tenzing : A SQL Impl...

This paper which appeared in this year’s VLDB talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across ...

continue reading
HipG: Parallel Processing of Large-Scale Graphs

HipG: Parallel Proce...

Abstract Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-defined ...

continue reading
Browsing all articles in Distributed File Systems

Hadoop Distributed File System

The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. ByContinue reading

TidyFS: A Simple and Small Distributed Filesystem

In recent years, there has been an explosion of interest in computing using clusters of commodity, shared nothing computers. In this paper, we describe the design of TidyFS, a simple and small distributed file system that provides the abstractions necessary for data parallel computations onContinue reading

Frangipani: A Scalable Distributed File System

The ideal distributed file system would provide all its users with coherent, shared access to the same set of files,yet would be arbitrarily scalable to provide more storage space and higher performance to a growing user community. It would be highly available in spite ofContinue reading

The Google File System

We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While sharing many ofContinue reading