TwitterRSS
The Limitation of MapReduce: A Probing Case and a Lightweight Solution

The Limitation of Ma...

While we usually see enough papers that deal with the applications of the Map Reduce programming model this one for a change tries to address the limitations of the MR model. It argues that MR only allows a program to ...

continue reading
Keyword Searching and Browsing in Databases using BANKS

Keyword Searching an...

BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. ...

continue reading
HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

HadoopDB: Efficient...

The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address ...

continue reading
Spark: Cluster Computing with Working Sets

Spark: Cluster Compu...

One of the aspects you can’t miss even as you just begin reading this paper is the strong scent of functional programming that the design of Spark bears. The use of FP idioms is quite widespread across the architecture of ...

continue reading
Kafka: a Distributed Messaging System for Log Processing

Kafka: a Distributed...

Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications. Why would a traditional messaging system not be a good fit for log processing? Typical enterprise ...

continue reading
Windows Azure Storage : A Highly Available   Cloud Storage Service with Strong Consistency

Windows Azure Storag...

Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as ...

continue reading
Thialfi: A Client Notification Service for Internet-Scale Applications

Thialfi: A Client No...

The Scandinavian mythology regards Thialfi, a swift runner, as the attendent of Thor, the god of war. Motivated by the swiftness that qualifies Thialfi, was perhaps why the folks at Google named their message delivery system (it delivers notifications at ...

continue reading
Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming

Spotify: Large Scale...

This paper from folks at Spotify primarily focuses on how they use P2P techniques in their platform. The service is not web-based, but instead uses a proprietary client and protocol. At the heart of the system is this custom music streaming ...

continue reading
Tenzing : A SQL Implementation On The MapReduce Framework

Tenzing : A SQL Impl...

This paper which appeared in this year’s VLDB talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across ...

continue reading
HipG: Parallel Processing of Large-Scale Graphs

HipG: Parallel Proce...

Abstract Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-defined ...

continue reading
Browsing all articles in Distributed Storage

Keyword Searching and Browsing in Databases using BANKS

BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. Hearty congrats to the teamContinue reading

HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address the problem of large scaleContinue reading

Kafka: a Distributed Messaging System for Log Processing

Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications. Why would a traditional messaging system not be a good fit for log processing? Typical enterprise messaging systems tendContinue reading

Windows Azure Storage : A Highly Available Cloud Storage Service with Strong Consistency

Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as a public cloud service. ItContinue reading

Tenzing : A SQL Implementation On The MapReduce Framework

This paper which appeared in this year’s VLDB talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across two data centers, with twoContinue reading

PNUTS – Yahoo!’s Hosted Data Serving Platform

PNUTS is a geographically distributed database developed and in use at Yahoo!. PNUTS works off a simple relational model in which data is organized as tables and columns. It allows modifications to the schema without halting queries and updates but the current version does notContinue reading

Finding a needle in Haystack: Facebook’s photo storage

Users upload a billion photos (about 60 terabytes) each week and Facebook serves over one million images per second at peak and Haystack is the system that is behind all this. Haystack is a system where data is written once, read frequently, never modified andContinue reading

Comet: An active distributed key-value store

Distributed key-value storage systems are widely used in corporations and across the Internet. Our research seeks to greatly expand the application space for key-value storage systems through application-specific customization. We designed and implemented Comet, an extensible, distributed key-value store. Each Comet node stores a collectionContinue reading

Hive – A Warehousing Solution Over a Map-Reduce Framework

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large dataContinue reading

Bayou: Replicated Database Services for World-wide Applications

The Bayou architecture provides scalability, availability, extensibility, and adaptability features that address database storage needs of world-wide applications. In addition to discussing these features, this paper presents Bayou’s mechanisms for permitting the replicas of a database to vary dynamically without global coordination. Key is theContinue reading

Pages:123»