TwitterRSS
The Limitation of MapReduce: A Probing Case and a Lightweight Solution

The Limitation of Ma...

While we usually see enough papers that deal with the applications of the Map Reduce programming model this one for a change tries to address the limitations of the MR model. It argues that MR only allows a program to ...

continue reading
Keyword Searching and Browsing in Databases using BANKS

Keyword Searching an...

BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. ...

continue reading
HadoopDB: Efficient Processing of Data Warehousing Queries in a Split Execution Environment

HadoopDB: Efficient...

The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address ...

continue reading
Spark: Cluster Computing with Working Sets

Spark: Cluster Compu...

One of the aspects you can’t miss even as you just begin reading this paper is the strong scent of functional programming that the design of Spark bears. The use of FP idioms is quite widespread across the architecture of ...

continue reading
Kafka: a Distributed Messaging System for Log Processing

Kafka: a Distributed...

Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications. Why would a traditional messaging system not be a good fit for log processing? Typical enterprise ...

continue reading
Windows Azure Storage : A Highly Available   Cloud Storage Service with Strong Consistency

Windows Azure Storag...

Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as ...

continue reading
Thialfi: A Client Notification Service for Internet-Scale Applications

Thialfi: A Client No...

The Scandinavian mythology regards Thialfi, a swift runner, as the attendent of Thor, the god of war. Motivated by the swiftness that qualifies Thialfi, was perhaps why the folks at Google named their message delivery system (it delivers notifications at ...

continue reading
Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming

Spotify: Large Scale...

This paper from folks at Spotify primarily focuses on how they use P2P techniques in their platform. The service is not web-based, but instead uses a proprietary client and protocol. At the heart of the system is this custom music streaming ...

continue reading
Tenzing : A SQL Implementation On The MapReduce Framework

Tenzing : A SQL Impl...

This paper which appeared in this year’s VLDB talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across ...

continue reading
HipG: Parallel Processing of Large-Scale Graphs

HipG: Parallel Proce...

Abstract Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-defined ...

continue reading
Browsing all articles in Personalization & Recommendation

The YouTube video recommendation system

We discuss the video recommendation system in use at YouTube, the world’s most popular online video community. The system recommends personalized sets of videos to users based on their activity on the site. We discuss some of the unique challenges that the system faces andContinue reading

Large-Scale Sentiment Analysis for News and Blogs

Abstract: News can be good or bad, but it is seldom neutral. Although full comprehension of natural language text remains well beyond the power of machines, the statistical analysis of relatively simple sentiment cues can provide a surprisingly mean- ingful sense of how the latestContinue reading

The Roma Personal Metadata Service

People now have available to them a diversity of digital storage facilities, including laptops, cell phone address books, handheld devices, desktop computers and web-based storage services. Unfortunately, as the number of personal data repositories increases, so does the management problem of ensuring that the mostContinue reading

FeedTree: Sharing Web micronews with peer-to-peer event notification

Syndication of micronews, frequently-updated content on the Web, is currently accomplished with RSS feeds and client applications that poll those feeds. However, providers of RSS content have recently become concerned about the escalating bandwidth demand of RSS readers. Current efforts to address this problem byContinue reading