TwitterRSS

Keyword Searching and Browsing in Databases using BANKS

BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. Hearty congrats to the team from IIT Bombay's CSE department. Previewing from http://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/BanksICDE2002.pdf

continue reading

HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address the problem of large scale data storage and analysis. This early paper that introduced HadooDB crisply summarizes some reasons why parallel database solutions haven't scaled to hundreds machines. The reasons include ...

continue reading

Spark: Cluster Computing with Working Sets

One of the aspects you can’t miss even as you just begin reading this paper is the strong scent of functional programming that the design of Spark bears. The use of FP idioms is quite widespread across the architecture of Spark such as the ability to restore a partition from by applying a closure block, operations such as reduce and map/collect, distributed accumulators etc. It would suffice to say that ...

continue reading

Kafka: a Distributed Messaging System for Log Processing

Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications. Why would a traditional messaging system not be a good fit for log processing? Typical enterprise messaging systems tend to offer a rich set of delivery guarantees. Such extensive guarantees are an overkill for log processing scenarios. Secondly enterprise class messaging systems typically do not focus ...

continue reading

Windows Azure Storage : A Highly Available Cloud Storage Service with Strong Consistency

Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as a public cloud service. It currently handles about 70 PB of raw storage in production and is set to add a few hundred more in the near future. The Architecture WAS is ...

continue reading

Thialfi: A Client Notification Service for Internet-Scale Applications

The Scandinavian mythology regards Thialfi, a swift runner, as the attendent of Thor, the god of war. Motivated by the swiftness that qualifies Thialfi, was perhaps why the folks at Google named their message delivery system (it delivers notifications at sub-second intervals!) after him. The Case for another Notification Service Its quite common for applications to share their data with other application users and devices. These “clients” of the data usually maintain ...

continue reading

Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming

This paper from folks at Spotify primarily focuses on how they use P2P techniques in their platform. The service is not web-based, but instead uses a proprietary client and protocol. At the heart of the system is this custom music streaming protocol that is optimized for accessing a large library of tracks (and not so much for live broadcasts). The clients are available for several platforms including smart phones. The most ...

continue reading

Tenzing : A SQL Implementation On The MapReduce Framework

This paper which appeared in this year’s VLDB talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across two data centers, with two thousand cores each on 1.5 PB of compressed data. The queries claim to have latencies of the order of 10 seconds. Looks like the motivation for ...

continue reading

HipG: Parallel Processing of Large-Scale Graphs

Abstract Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-defined pieces of sequential work on graph nodes. To make the user code high-level, the framework provides a unified interface to executing methods on local and non-local graph nodes and an ...

continue reading

Semantics of Caching with SPOCA: A Stateless, Proportional, Optimally-Consistent Addressing Algorithm

This paper describes the essential parts of Yahoo’s video delivery system. The Yahoo! Video Platform has a library of over 20 million video assets. From this library, end users make about 30,000,000 requests per day for over 800,000 unique videos, which creates a low ratio of total requests to unique requests. Also, because videos are large, a typical front-end server can hold only 500 unique videos in memory and 100,000 ...

continue reading

Keyword Searching and Browsing in Databases using BANKS

BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. Hearty congrats to the teamContinue reading

HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address the problem of large scaleContinue reading

Spark: Cluster Computing with Working Sets

One of the aspects you can’t miss even as you just begin reading this paper is the strong scent of functional programming that the design of Spark bears. The use of FP idioms is quite widespread across the architecture of Spark such as the abilityContinue reading

Kafka: a Distributed Messaging System for Log Processing

Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications. Why would a traditional messaging system not be a good fit for log processing? Typical enterprise messaging systems tendContinue reading

Windows Azure Storage : A Highly Available Cloud Storage Service with Strong Consistency

Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as a public cloud service. ItContinue reading

Thialfi: A Client Notification Service for Internet-Scale Applications

The Scandinavian mythology regards Thialfi, a swift runner, as the attendent of Thor, the god of war. Motivated by the swiftness that qualifies Thialfi, was perhaps why the folks at Google named their message delivery system (it delivers notifications at sub-second intervals!) after him. TheContinue reading

Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming

This paper from folks at Spotify primarily focuses on how they use P2P techniques in their platform. The service is not web-based, but instead uses a proprietary client and protocol. At the heart of the system is this custom music streaming protocol that is optimizedContinue reading

Tenzing : A SQL Implementation On The MapReduce Framework

This paper which appeared in this year’s VLDB talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across two data centers, with twoContinue reading

HipG: Parallel Processing of Large-Scale Graphs

Abstract Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-defined pieces of sequential workContinue reading

Semantics of Caching with SPOCA: A Stateless, Proportional, Optimally-Consistent Addressing Algorithm

This paper describes the essential parts of Yahoo’s video delivery system. The Yahoo! Video Platform has a library of over 20 million video assets. From this library, end users make about 30,000,000 requests per day for over 800,000 unique videos, which creates a low ratioContinue reading

Pages:123456789»