<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Systems We Make</title>
	<atom:link href="http://www.systemswemake.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.systemswemake.com</link>
	<description>Curating Complex Systems</description>
	<lastBuildDate>Thu, 15 Mar 2012 19:01:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>The Limitation of MapReduce: A Probing Case and a Lightweight Solution</title>
		<link>http://www.systemswemake.com/papers/mrcc-distributed-compiler</link>
		<comments>http://www.systemswemake.com/papers/mrcc-distributed-compiler#comments</comments>
		<pubDate>Thu, 15 Mar 2012 18:59:00 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Applications]]></category>
		<category><![CDATA[distributed compiler]]></category>
		<category><![CDATA[map reduce]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=1151</guid>
		<description><![CDATA[While we usually see enough papers that deal with the applications of the Map Reduce programming model this one for a change tries to address the limitations of the MR model. It argues that MR only allows a program to scale up to process very<span class="readmore-post"><a href="http://www.systemswemake.com/papers/mrcc-distributed-compiler">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>While we usually see enough papers that deal with the applications of the Map Reduce programming model this one for a change tries to address the limitations of the MR model. It argues that MR only allows a program to scale up to process very large data sets, but constrains a program’s ability to process smaller data items. This ability or inability (depending on how you see it) is what it terms as &#8220;one-way scalability&#8221;. Obviously this &#8220;one-wayness&#8221; was a requirement for Google but here the authors turn our attention to how this impacts the application of this framework to other computation forms.</p>
<p>The system they argue based on is a <strong>distributed compiler</strong> and their solution is a more scaled &#8220;down&#8221; parallelization framework called MRLite that handles more moderate volumes of data. The workload characteristics of a compiler are bit different from analytical workloads. Primary differences being compilation workloads deal with much more humble volumes of data albeit with much greater intertwining amongst the files.</p>
<p><a href="http://fclose.com/p/mrcc/" title="mrcc" target="_blank">mrcc</a>, which is the name of the distributed compiler follows a master slave model. The main mrcc program runs on the master node. The other &#8220;map&#8221; component, mrcc-map runs on the slave nodes. </p>
<p><strong>A Cycle of Distributed Compilation</strong></p>
<p>The compilation cycle starts with the code base of a project being submitted to mrcc, the master program. The master program forks a &#8220;preprocessor&#8221; process after scanning the arguments passed to the compiler. This preprocessor merges the header file into the source file. This is in preparation for the next step which will distribute these preprocessed files to different slaves. In order to keep the preprocessed files accessible to the slaves these files are kept on a network file system. Very similar to what we see in GFS and HDFS. </p>
<p><strong>mrcc</strong> then initiates the remote compilation of the preprocessed files on multiple slave machines. mrcc-map, the program that runs on the slave is the one that performs the compilation.<br />
It ﬁrst parses its arguments to obtain the source file name on the network ﬁle system and the compiler arguments. It then retrieves the preprocessed file from the network file system. After that, mrcc-map calls the local gcc compiler and passes the compilation arguments to it. When gcc exits with a successful return value, mrcc-map places the object ﬁle into the network file system and returns immediately.</p>
<p><strong>Distributed Compiling using Hadoop</strong></p>
<p>The paper then describes the consequences of performing the above compilation cycle on Hadoop. In summary it appear that the compilation time using mrcc/hadoop on 10 nodes is at least twice as long as that on one node (sequential compilation).<br />
The reasons for this slowness can be attributed to a) overheads due to spawning a new process for each compilation batch on the slaves b) retrieving and writing the file back onto the NFS server etc. In a nutshell they argue that the tasking and data transportation overheads are acceptable only for the class of applications where relatively simple processing logic is applied to a large number of independent units of work.</p>
<p><strong>MR Lite for distributed compilation workloads</strong></p>
<p>So MRlite comes to the rescue. It optimizes for large scale parallelism and low latency to provide a more general and flexible parallel execution capability.<br />
Bearing a great deal of similarity with the classic MapReduce framework MRlite is made up of  1) the MRlite Master 2) MRlite slaves 3) In Memory NFS server  and 4) MRlite client.</p>
<p>The master controls the parallel execution of the tasks. The client submits the job to the master which in turn is submitted to multiple slaves.<br />
Some key aspects of MRlite&#8217;s design include &#8211;<br />
1) The timing control feature in its design as part of the low-latency execution mechanism<br />
2) Master submits tasks to slaves without sophisticated queueing to maximize the possibility of ﬁnishing the job within the timeout limit<br />
3) Use of run-time daemons and thread pools to support the operations of the master and the slaves. The reduces the cost of creating a process.<br />
4) NFS server running on only one node to provide the file system abstraction. This server runs atop a virtual memory file system so that operations are as fast as in-memory operations<br />
5) Reliability through multiple-way replication is not included in MRlite</p>
<p><strong>Previewing from <a href="http://www.cse.ust.hk/~zma/publication/ma2010mrlite.pdf" title="http://www.cse.ust.hk/~zma" target="_blank">http://www.cse.ust.hk/~zma</a><br />
</strong><br />
<iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.cse.ust.hk%2F~zma%2Fpublication%2Fma2010mrlite.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/mrcc-distributed-compiler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Keyword Searching and Browsing in Databases using BANKS</title>
		<link>http://www.systemswemake.com/papers/banks</link>
		<comments>http://www.systemswemake.com/papers/banks#comments</comments>
		<pubDate>Sun, 19 Feb 2012 09:15:31 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Storage]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=1135</guid>
		<description><![CDATA[BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. Hearty congrats to the team<span class="readmore-post"><a href="http://www.systemswemake.com/papers/banks">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>BANKS is a system that enables keyword based searches on a relational database. As a paper that was published 10 years ago in ICDE 2002, it has won the most influential paper award for past decade this year at ICDE. Hearty congrats to the team from IIT Bombay&#8217;s CSE department.</p>
<p><strong>Previewing from <a href="http://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/BanksICDE2002.pdf" title="http://www.cse.iitb.ac.in/~sudarsha/" target="_blank">http://www.cse.iitb.ac.in/~sudarsha/Pubs-dir/BanksICDE2002.pdf</a></strong></p>
<p><iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.cse.iitb.ac.in%2F~sudarsha%2FPubs-dir%2FBanksICDE2002.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/banks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HadoopDB: Efﬁcient Processing of Data Warehousing Queries in a Split Execution Environment</title>
		<link>http://www.systemswemake.com/papers/query-proc</link>
		<comments>http://www.systemswemake.com/papers/query-proc#comments</comments>
		<pubDate>Sat, 28 Jan 2012 15:09:30 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Storage]]></category>
		<category><![CDATA[distributed data warehouse]]></category>
		<category><![CDATA[parallel databases]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=1129</guid>
		<description><![CDATA[The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address the problem of large scale<span class="readmore-post"><a href="http://www.systemswemake.com/papers/query-proc">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>The buzz about Hadapt and HadoopDB has been around for a while now as it is one of the first systems to combine ideas from two different approaches, namely parallel databases based on a shared-nothing architecture and map-reduce, to address the problem of large scale data storage and analysis.</p>
<p>This <a href="http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf" title="http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf" target="_blank">early paper</a> that introduced HadooDB crisply summarizes some reasons why parallel database solutions haven&#8217;t scaled to hundreds machines. The reasons include -<br />
  1. As the number of nodes in a system increases failures become more common.<br />
  2. Parallel databases usually assume a homogeneous array of machines which becomes impractical as the number of machines rise.<br />
  3. They have not been tested at larger scales as applications haven&#8217;t demanded more than 10&#8242;s of nodes for performance until recently.</p>
<p>Based on this reasoning the paper outlines some key characteristics that systems meant to handle analytical workloads have to demonstrate. These are -<br />
  1. Performance as in the number of compute units required to perform an analysis task should be low.<br />
  2. Fault tolerance in the context of analytical workloads means that the query when distributed for processing should not be restarted even when one of the nodes involved in processing the query fails.<br />
  3. A system designed to run in an heterogeneous environment translates to the ability of the system to be sensitive to the actual computational capabilities of the nodes when distributing a query for execution. In the absence of such an ability the time taken to execute a distributed query will be limited by the slowest node.<br />
  4. The query interface to such a system should allow working with commonly used tools found in the BI landscape such as tools for ad-hoc querying, creating dashboards along with extensibility through user defined functions etc.</p>
<p><strong>HadoopDB Design &#038; Architecture</strong><br />
The basic idea behind HadoopDB is to have a system that is composed of single-node databases with Hadoop coordinating query execution across multiple nodes. Hadoop itself acts as the coordinator of the query execution task by taking control of scheduling and job tracking. It pushes much of the query heavy lifting to the databases themselves in order to maximize the performance.</p>
<p><strong>Building Blocks</strong><br />
The whole system is engineered as an extension of Hadoop by adding custom components to a) connect and interact with a single-node database b) read metadata about the different database nodes c) repartition and bulk load data d) plan, translate (into MR tasks) and execute query</p>
<p><strong>The Database Connector</strong> &#8211; This component is implemented as a custom InputFormat implementation. An InputFormat encapsulates the mechanism of reading data from a data source and translating it into a key value format which will be palatable to the Mappers.<br />
The connector extends the InputFormat interface. Each map reduce job supplies a query and other connection parameters to the connector. Using this the connector executes the query against a single node database instance and translates the results into a set of key value pairs.<br />
The initial version of the connector only supported accessing a group of co-partitioned tables in a single database schema. Later the ability to access tables across multiple schema in a single Map job was added. Progressing along, the ability to consume inputs from both database tables and HDFS files was added too. All query execution beyond the Map phase is carried out within Hadoop.</p>
<p><strong>The Catalog</strong> &#8211; This plays a role very similar to what the name node plays within HDFS. It contains information on the different single-node instances, data access statistics, partitioning properties etc. It is implemented as an XML file on HDFS and is accessed by the Job and Task trackers.</p>
<p><strong>The Data Loader</strong> &#8211; It is responsible largely for bringing in data into the system, making sure that it distributes the data amongst the databases in the cluster based on the partitioning policy etc.</p>
<p><strong>The Query Interface</strong> &#8211; It is the front end to the database that allows users to interact with the database using SQL. The primary responsibility of the system is to translate the query into a series of MR jobs, execute them and return the results. In the case of HadoopDB much of this layer is designed by adapting <a href="http://www.systemswemake.com/papers/hive" title="Hive – A Warehousing Solution Over a Map-Reduce Framework" target="_blank">Hive&#8217;s</a> query interface layer</p>
<p>The rest of this paper discusses the techniques that are used for the optimization and execution of warehouse queries split across Hadoop and the single node database instances.</p>
<p><strong>Previewing from <a href="http://cs-www.cs.yale.edu/homes/dna/papers/split-execution-hadoopdb.pdf" target="_blank">http://cs-www.cs.yale.edu</a><br />
</strong></p>
<p><iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fcs-www.cs.yale.edu%2Fhomes%2Fdna%2Fpapers%2Fsplit-execution-hadoopdb.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/query-proc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spark: Cluster Computing with Working Sets</title>
		<link>http://www.systemswemake.com/papers/spark</link>
		<comments>http://www.systemswemake.com/papers/spark#comments</comments>
		<pubDate>Fri, 06 Jan 2012 20:19:04 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Programming]]></category>
		<category><![CDATA[data-parallel programming]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=1083</guid>
		<description><![CDATA[One of the aspects you can’t miss even as you just begin reading this paper is the strong scent of functional programming that the design of Spark bears. The use of FP idioms is quite widespread across the architecture of Spark such as the ability<span class="readmore-post"><a href="http://www.systemswemake.com/papers/spark">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>One of the aspects you can’t miss even as you just begin reading this paper is the strong scent of functional programming that the design of Spark bears. The use of FP idioms is quite widespread across the architecture of Spark such as the ability to restore a partition from by applying a closure block, operations such as reduce and map/collect, distributed accumulators etc. It would suffice to say that it is a very functional system. Pun intended!</p>
<p>Spark is written in Scala and is well suited for the class of applications <strong>that reuse a working set of data across multiple parallel operations</strong>. It claims to outperform Hadoop by 10x in iterative machine learning jobs, and has been tried successfully to interactively query a 39 GB dataset with sub-second response time!<br />
Its is built on top of <a href="http://www.systemswemake.com/blog/16/mesos/" title="Mesos" target="_blank">Mesos</a>, a resource management infrastructure, that lets multiple parallel applications share a cluster in a ﬁne-grained manner and provides an API for applications to launch tasks on a cluster.</p>
<p>Developers write a driving program that orchestrates various parallel operations. Spark’s programming model provides two abstractions to work with large datasets : resilient distributed datasets and parallel operations. In addition it supports two kinds of shared variables.</p>
<p><strong>Resilient Distributed Datasets</strong><br />
A resilient distributed dataset is a group of read only objects residing on multiple machines. A unique characteristic of this partitioned collection is that it can be restored completely in the event of a partition loss. Its able to offer this ability due to the fact that the RDD as whole contains enough information to compute and derive the elements of the collection by starting with information seeded in some reliable storage.</p>
<p><strong>Parallel Operations</strong><br />
Parallel operations that can be performed include map/collect, combine/reduce and iterate over the dataset. These operations are invoked by passing closures to Spark i.e code is passed around as data. In order to provide a “regular” functional experience any of the variables accessed by these code blocks are copied over to the worker nodes where the computation is performed. Since Scala closures are Java objects that can be serialized, this mechanism is used for transferring the code over to the workers.</p>
<p>Over and above this basic feature it also offers two types of variables with special semantics called <strong>broadcast variables</strong> and <strong>accumulators</strong>.<br />
Broadcast variables are created when there is a large piece of read-only data that needs to be made repeatedly available to all the worker nodes. Its effectively an optimization that reduces redundant copy/transfer operations of the same piece of data.</p>
<p>Accumulators have the same semantics as we know them in other languages/programming models. They are variables that represent a sum over a collection. Any datatype that you can define an “add” operation on can leverage accumulators.<br />
These variables are implemented as custom classes which in turn have specific serialization algorithm. Details about their implementation is explained in section 4 of the paper. The paper also has a good collection of parallel programs based on this programming model. Although its still evolving it has found takers in the industry. Check out the description of <a href="http://www.conviva.com/blog/engineering/using-spark-and-hive-to-process-bigdata-at-conviva" target="_blank">Conviva&#8217;s data processing system</a> which uses Spark.</p>
<p><strong>Previewing from <a href="http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf" title="http://www.cs.berkeley.edu/" target="_blank">http://www.cs.berkeley.edu/~matei/papers/2010/hotcloud_spark.pdf</a><br />
</strong><iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.cs.berkeley.edu%2F~matei%2Fpapers%2F2010%2Fhotcloud_spark.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/spark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Kafka: a Distributed Messaging System for Log Processing</title>
		<link>http://www.systemswemake.com/papers/kafka</link>
		<comments>http://www.systemswemake.com/papers/kafka#comments</comments>
		<pubDate>Tue, 20 Dec 2011 18:27:55 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Storage]]></category>
		<category><![CDATA[distributed log processing]]></category>
		<category><![CDATA[distributed messaging]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=1025</guid>
		<description><![CDATA[Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications. Why would a traditional messaging system not be a good fit for log processing? Typical enterprise messaging systems tend<span class="readmore-post"><a href="http://www.systemswemake.com/papers/kafka">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>Kafka, a system developed at LinkedIn, is essentially a messaging system that is designed to support aggregation of high throughput log messages arriving from different applications.</p>
<p><strong>Why would a traditional messaging system not be a good fit for log processing?</strong></p>
<ul>
<li>Typical enterprise messaging systems tend to offer a rich set of delivery guarantees. Such extensive guarantees are an overkill for log processing scenarios.
</li>
<li>Secondly enterprise class messaging systems typically do not focus on maximizing throughput as the primary design constraint.
</li>
<li>Third, enterprise systems are weak in terms of support for distribution of messages. You cannot configure it to partition and store messages on different machines.
</li>
<li>Finally, their performance degrades when messages are allowed to remain in the queue for extended periods of time.</li>
</ul>
<p>Looks like these were some of the reasons why they were motivated to build a different messaging system with a primary design goal of enabling very high throughput of messages.</p>
<p><strong>Internals &#038; Architecture</strong></p>
<p>In Kafka a Topic is the container with which messages are associated. Producers send/publish messages to a topic and consumers consume these messages by pulling from topics they subscribe to. The published messages are stored on servers referred to as brokers.<br />
Since Kafka is designed to support distribution of messages on different machines a typical cluster consists multiple brokers with each broker storing only a portion of all the messages from a topic.<br />
A unique characteristic feature of Kafka is that is support a “pull” model for message consumption. Having this feature enables the consuming application to control the rate at which it wants to consume the messages vis-à-vis the typical “push” model that could flood the consumer.</p>
<p><strong>Storage Layout</strong></p>
<p>Kafka follows a simple storage layout with each partition of a topic corresponding to a logical log. Physically the log is further subdivided into segment files of 1 GB each.<br />
When a producer publishes a message the payload is appended to the current segment. The messages are flushed to the disk either after a specific number of messages have accumulated or a certain time period has elapsed. The consumer sees the message only after the message has been flushed to the disk.</p>
<p><strong>Speeding up Data Transfers, Achieving Higher Throughput</strong></p>
<p>Data transfer is accelerated by -<br />
1) Enabling the producer to send a batch of messages in one go. Once a message is sent the producer does not wait for an acknowledgment from the broker. The idea is to send messages as fast as a broker can handle. This significantly increases the publisher throughput.<br />
2) Enabling the consumer to retrive messages in batches. Alongside this a very efficient storage format coupled with a stateless broker results in very high consumption throughput.<br />
3) No caching of messages in the Kafka layer. They rely on the underlying file system cache. This is a bit surprising. Not sure how they decided on going with this approach.<br />
4) Network access for consumers is optimized because of the fact that the Linux sendfile api is used. The sendfile api operates in the kernel space and hence is quicker. Nice!</p>
<p>The net result of these tricks is that (in an experiment involving messages of 200 bytes each) on an average, Kafka can publish messages at the rate of 50,000 and 400,000 messages per second for batch size of 1 and 50, respectively. Orders of magnitude higher than what RabbitMQ and ActiveMQ demonstrate in the experiments.</p>
<p><strong>Distribution &#038; Coordination</strong></p>
<p>Kafka supports an abstraction called consumer groups. Each message when delivered to a consumer group is processed only by one consumer within that group.<br />
In Kafka the smallest unit of parallelism is the partition of a topic. This implies that all messages that are belong to a particular partition of a topic will be consumed by a consumer in a consumer group.<br />
Coordination is decentralized without a permanent “master” node. Specifically, Zookeeper is employed to facilitate coordination between consumers and brokers.</p>
<p><strong>Delivery Guarantees</strong></p>
<p>Kafka only guarantees at-least-once delivery. In most cases a message is delivered exactly once to its consumers. It also guarantees that messages from a single partition are delivered to a consumer in order. Across partitions no such guarantee is made.</p>
<p><strong>Previewing from <a href="http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf" target="_blank">http://research.microsoft.com</a><br />
</strong></p>
<p><iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fpeople%2Fsrikanth%2Fnetdb11%2Fnetdb11papers%2Fnetdb11-final12.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/kafka/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Windows Azure Storage : A Highly Available   Cloud Storage Service with Strong Consistency</title>
		<link>http://www.systemswemake.com/papers/windows-azure-storage</link>
		<comments>http://www.systemswemake.com/papers/windows-azure-storage#comments</comments>
		<pubDate>Thu, 24 Nov 2011 20:32:00 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Storage]]></category>
		<category><![CDATA[cloud storage]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=1028</guid>
		<description><![CDATA[Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as a public cloud service. It<span class="readmore-post"><a href="http://www.systemswemake.com/papers/windows-azure-storage">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>Windows Azure Storage is a key component of the Windows Azure Cloud platform that offers an infinite disk in the cloud. It’s been in production since November 2008 and is used heavily within Microsoft in addition to being available as a public cloud service. It currently handles about 70 PB of raw storage in production and is set to add a few hundred more in the near future.</p>
<p><strong>The Architecture</strong><br />
WAS is engineered as a service that runs atop another service called the Windows Azure Fabric Controller. The Fabric Controller is a lower level service that provides a lot of cross cutting functionality such as node management, network configuration, health monitoring, starting/stopping of service instances etc.<br />
In some ways WAS takes on responsibilities very similar to what we see in a distributed file system’s name/metadata node such as data placement across disks, replication and load balancing. At a high level it is logically organized into two units called Storage Stamps and the Location Service. It supports three different storage abstractions namely Blobs, Tables and Queues.</p>
<p><strong>Storage Stamps</strong><br />
Each Storage Stamp is a cluster of multiple racks. Typically there are about 10-20 racks with about 18 storage nodes per rack. Currently each storage stamp holds about 2PB of data but is soon expected to scale to holding up to 30 PB per stamp. Optimum utilization level of this infrastructure seems to be around the 70% limit. When a stamp reaches this limit the Location Service migrates contents to other stamps through replication.</p>
<p><strong>A Partitioned Namespace for Objects</strong><br />
The objects in this storage system are all part of a single global namespace. This is achieved by using DNS and synthesizing the identification scheme from a customer account name, a partition name and an object name. All objects are accessible via an URI of the form http(s)://AccountName.<service>.core.windows.net/PartitionName/ObjectName<br />
Account Name is the name chosen by the customer. The DNS entry corresponding to this is mapped to the primary storage cluster in the appropriate data center where this data is stored. Partition Name located the data within the cluster and the finally the Object Name identifies the actual stored object.</p>
<p><strong>The Location Service</strong><br />
The Location Service is responsible for managing the storage stamps. Different accounts are mapped to different stamps by the LS. It also ensures that the data is replicated and load balanced.<br />
WAS storage locations are spread across North America, Europe and Asia. Its built in such a way that each location has one data center and each data center holds multiple storage stamps. New regions or new locations to a region or new storage stamps to a location can be added anytime.<br />
When an application requests creates a new account for storing data it is allowed to specify its location affinity. The LS allocates a storage stamp based on this preference and updates DNS to route traffic to that particular Storage Stamp’s Virtual IP.</p>
<p><strong>Anatomy of Storage Stamp</strong><br />
<em>Stream Layer</em> : Its the lowermost layer of a stamp. It is the layer that is responsible for storing the data on the disk. It also ensures durability of the stored units by replicating them across storage nodes within the stamp. The unit of storage is known as a stream. This layer does not concern itself with the semantics of the objects that are being stored. Data that is stored here is accessible from the layer above it.<br />
<em>Partition Layer</em> : It is the layer that understands semantic differences between the different storage abstractions, provides a namespace for them, transaction ordering and strong consistency for the objects and caching services.<br />
<em>Front End Layer</em> : This layer is a stateless server that intercepts the request for any object and routes it to the appropriate partition server that can serve this request. Before doing this it authenticates and authorizes the request based on the account details. They also stream large objects directly from the stream layer and cache frequently requested data.</p>
<p><strong>Replication</strong><br />
Contents in a stamp are replicated within the stamp’s storage nodes and across stamps. Intra stamp replication is synchronous and is performed by the stream layer.<br />
Inter stamp is asynchronous and is performed by the partition layer. Inter-stamp replication is focused on replicating objects and the transactions applied to those objects, whereas intra-stamp replication is focused on replicating blocks of disk storage that are used to make up the objects. Intra stamp replication safeguards the data from being lost due to machine failure and inter stamp replication provides geo redundancy and enables disaster recovery.</p>
<p><strong>The Stream Layer and its relationship to Tidy FS</strong><br />
It appears like this layer is essentially a distributed file system and that too one that is based on to another work from Microsoft Research. Reading about <a href="http://systemswemake.com/papers/tidyfs">TidyFS</a> at this point possibly will help understand the nomenclature and semantics of the elements of the Stream Layer. Stream is the unit of storage and only append operations are permitted on them. This is a feature that is now commonplace in all distributed file systems.</p>
<p><strong>Some Interesting Lessons Learnt </strong><br />
<em>Scaling computation separately from storage</em><br />
What this means is the VMs that run the application are not the same machines that store the data owned by the application. This is done intentionally so that demands on compute and storage can be met by scaling independently. Instead of collocating compute and storage the system is built to allow computation to access storage via high bandwidth network. In order to ensure quick access they are also moving a different data center architecture that flattens the network topology.</p>
<p><em>Throttling/Isolation</em><br />
Due to a large number of accounts it soon becomes difficult to track every account’s usage profile and throttle based on that profile information. So in order to determine if an account is well-behaving or not it uses a Sample Hold algorithm to track the top N busiest accounts and partitions. When a server gets overloaded this information is consulted to throttle the traffic.</p>
<p><em>Append Only System</em><br />
It greatly simplifies the replication protocol and failure handling scenarios.In this model once committed data is never overwritten and hence consistency across replicas can be enforced on the basis of the commit lengths.</p>
<p><em>Multiple Data Abstractions from a Single Stack</em><br />
This system supports three different data abstractions all based on the same underlying storage stack. All the abstractions use the same intra and inter stamp replication and load balancing mechanisms. Due to the fact that the performance of the three abstractions vary a lot this design allows them to run all of these different workloads on the same set of storage nodes and thus improve utilization.</p>
<p><em>CAP Theorem</em><br />
Although CAP states that all of Consistency, Availability and Partition Tolerance cannot be provided within a distributed system WAS is system that provides high availability with strong consistency guarantees. And within a stamp all the three properties are realized. This is achieved through layering and designing around a specific fault model. Its quite interesting to learn how strong consistency is provided despite network partitions. In essence what this means is within a storage stamp the system is engineered to behave more like a monolithic system than a distributed one.</p>
<p><strong>Previewing from <a href="http://sigops.org/sosp/sosp11/current/2011-Cascais/printable/11-calder.pdf" target="_blank">http://sigops.org/sosp/sosp11/current/2011-Cascais/printable/11-calder.pdf</a></strong></p>
<p><iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fsigops.org%2Fsosp%2Fsosp11%2Fcurrent%2F2011-Cascais%2Fprintable%2F11-calder.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/windows-azure-storage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thialfi: A Client Notification Service for Internet-Scale Applications</title>
		<link>http://www.systemswemake.com/papers/thialfi</link>
		<comments>http://www.systemswemake.com/papers/thialfi#comments</comments>
		<pubDate>Sat, 29 Oct 2011 20:51:06 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Applications]]></category>
		<category><![CDATA[distributed messaging]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=976</guid>
		<description><![CDATA[The Scandinavian mythology regards Thialfi, a swift runner, as the attendent of Thor, the god of war. Motivated by the swiftness that qualifies Thialfi, was perhaps why the folks at Google named their message delivery system (it delivers notifications at sub-second intervals!) after him. The<span class="readmore-post"><a href="http://www.systemswemake.com/papers/thialfi">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>The Scandinavian mythology regards Thialfi, a swift runner, as the attendent of Thor, the god of war. Motivated by the swiftness that qualifies Thialfi, was perhaps why the folks at Google named their message delivery system (it delivers notifications at sub-second intervals!) after him.</p>
<p><strong>The Case for another Notification Service</strong><br />
Its quite common for applications to share their data with other application users and devices. These “clients” of the data usually maintain a local copy of the data for many different reasons. Due to this arises the need to keep the data in sync/fresh across other users and devices. For example if you alter a calendar entry on your mobile phone you expect this change to synched across the calendars of all the other attendees almost immediately. This ability forms the core of a notification service.</p>
<p>In the absence of a general notification service applications settle for custom notification mechanisms. This either takes the form of a) frequent polling or b) push notifications. Both come with their own problems. Polling while conceptually simple and easy to implement, creates a tension between resource consumption and timeliness.<br />
With push notifications ensuring reliability of message delivery is very difficult, especially in an heterogeneous landscape like the internet.</p>
<p>If we were to conceive a generic notification service the it must offer:<br />
a) The ability to map different clients to different pieces of data based on interest<br />
b) Notifications must be reliable without necessitating low-frequency backup polling as a fallback.<br />
c) The service should see to it that the messages have made it through the last mile successfully into the client’s address space.<br />
d) The service must be able to deliver to multiple applications written in different languages over multiple communication channels/protocols</p>
<p><strong>Reliable Signaling over Reliable Messaging &#8211; An important tradeoff</strong><br />
The authors of Thialfi make some very good arguments for choosing to go with Reliable Signaling as opposed to what we typically find in messaging systems, which is the idea of Reliable Messaging.</p>
<p><strong>Why doesn’t Reliable Messaging apply here?</strong><br />
Firstly because it becomes very cumbersome to manage when your clients are often unavailable for long durations and in some cases may never return. In such a scenario if you go the reliable messaging route then message packets will remain queued up thus consuming more resources and eventually also cause message floods at the client end.<br />
Secondly message delivery tends to be very application specific. Data that is delivered to an application tends to have bespoke security, privacy and content format requirements.<br />
So instead of a one-size-that-fits-all messaging system, Thialfi is consciously designed as a system that provides <strong>reliable signaling</strong> (it collapses all notifications for an object into a single message). This enables Thialfi to remain more loosely coupled with the applications.</p>
<p><strong>System Architecture</strong><br />
Thialfi models data in terms of object identifiers and their version numbers. The objects themselves are stored within the application. Thialfi maintains no copy of the actual data. Each object is given a name which is unique within the scope of the owner application. These objects are versioned by the owning application. The application allocates a version number to each object and this information is passed along to Thialfi as part of a notification. Version numbers are constrained by the applications to be monotonically increasing as this is necessary for reliable delivery.</p>
<p><strong>Thialfi Client</strong><br />
Thialfi has a client library that provides applications the ability to register for updates to any shared object and receive notification callbacks from the server. It speaks the Thialfi protocol.<br />
As part of a notification that is delivered to the client Thialfi sends across the shared object’s identifier and the latest known version of the object. Using this information the client then accesses the application responsible for changing the object directly and thus synchronizes the state of the object. Thialfi does not take on the responsibility of syncing  up the state.</p>
<p><strong>The Journey of a Notification on Thialfi’s servers</strong><br />
The Thialfi server is notified of changes as and when an application modifies the state of a shared object. Applications that want to notify Thialfi use an publisher library to do the notification. Thialfi supports channels like XMPP, HTTP and RPC to cater to different applications.</p>
<p>The notification is intercepted first by a set of servers known as Bridge servers. These are stateless, randomly load-balanced tasks that consume a feed of application specific update messages from Google’s infrastructure pub/sub service, translate them into a standard notification format, and assemble them into batches for delivery to another set of servers called Matchers.</p>
<p>Matchers (they are partitioned by object) then consume the notification, match it with the set of registered clients, and forwards it to the Registrar for reliable delivery to clients. Matchers are partitioned over the set of objects and maintain a view of state indexed by object ID.</p>
<p>Registrars track clients, process registrations, and reliably deliver the notification using a view of state indexed by client ID (they are partitioned by client).</p>
<p><strong>Achieving Reliable Delivery</strong><br />
The authors define the property of reliable delivery as -<br />
<em>	“If a well-behaved client registers for an object X, Thialﬁ ensures that the client will always eventually learn of the latest version of X.”</em><br />
Thialﬁ achieves end-to-end reliability by ensuring that state changes in one component eventually propagate to all other relevant components of the system.<br />
Thialﬁ operates on two kinds of states namely, registration state (i.e., which clients care about which objects) and notification state (the latest known version of each object).</p>
<p><strong>Synchronizing Registration State</strong><br />
Registration state moves through three different components that include the client, the Registrar and the matcher. Every message from the client to the Registrar comes along with a digest summarizing the entire registration state. If this digest does not match the state at the Registrar Thialﬁ runs a Registration Sync Protocol. If the client or the server detects a discrepancy, the client resends its registrations to the server. If the server detects the problem, it requests that the client resend them. Thus the client and the Registrar are synchronized.<br />
When the Registrar commits a registration state change, a pending work marker is also set atomically. This marker is cleared only after all dependent writes to the Matcher have completed successfully. All writes are retried by the Registrar Propagator if any failure occurs. Its safe to do this as all the operations on the Thialﬁ server are idempotent by design.</p>
<p><strong>Synchronizing Notification State</strong><br />
Notification state that comes from the publisher moves through four different components that include the bridge, the matcher, the Registrar and the client.<br />
Notifications are removed from the update feed by the Bridge only after they have been successfully written to the Matcher’s persistent store. A periodic task in the Bridge reads the matcher’s temporary store and resends the notifications to the Matcher when required.<br />
When a notification is written to the Matcher database, a pending work marker is used to ensure eventual propagation similar to the Registrar to Matcher propagation of registration state.<br />
To insure the last leg of the journey the Registrar retains a notification for a client until either the client acknowledges it or a subsequent notification supersedes it.<br />
Additionally the Registrar periodically retransmits any outstanding notifications while the client is online, ensuring eventual delivery.</p>
<p>When viewed together each of these mechanisms work in tandem to provide reliable delivery of the notifications.</p>
<p><strong>Previewing from <a href="http://research.google.com/pubs/archive/37474.pdf" target="_blank">http://research.google.com/pubs/archive/37474.pdf</a><br />
</strong><br />
<iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fstatic.googleusercontent.com%2Fexternal_content%2Funtrusted_dlcp%2Fresearch.google.com%2Fen%2F%2Fpubs%2Farchive%2F37474.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/thialfi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming</title>
		<link>http://www.systemswemake.com/papers/low-latency-music</link>
		<comments>http://www.systemswemake.com/papers/low-latency-music#comments</comments>
		<pubDate>Sat, 08 Oct 2011 17:26:56 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Applications]]></category>
		<category><![CDATA[p2p]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=1003</guid>
		<description><![CDATA[This paper from folks at Spotify primarily focuses on how they use P2P techniques in their platform. The service is not web-based, but instead uses a proprietary client and protocol. At the heart of the system is this custom music streaming protocol that is optimized<span class="readmore-post"><a href="http://www.systemswemake.com/papers/low-latency-music">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>This paper from folks at Spotify primarily focuses on how they use P2P techniques in their platform.<br />
The service is not web-based, but instead uses a proprietary client and protocol. At the heart of the system is this custom music streaming protocol that is optimized for accessing a large library of tracks (and not so much for live broadcasts). The clients are available for several platforms including smart phones. The most notable feature of the Spotify client is its low playback latency. The median latency to begin playback of a track is 265 ms.<br />
The audio streams are encoded using Ogg Vorbis format. It uses TCP as the transport protocol instead of UDP. Between a pair of hosts a single TCP connection is used, and the application protocol multiplexes messages over the connection. While a client is running, it keeps a TCP connection to a Spotify server. Application layer messages are buffered, and sorted by priority before being sent to the operating system’s TCP buffers.</p>
<p><strong>Journey of a song</strong><br />
The journey of a song begins when the client makes a request to Spotify’s servers asking for a new track to be played. In this initial request about 15 seconds worth of music is transferred using the already open TCP connection.<br />
While this is on the client simultaneously issues a search in the P2P network for peers who can serve this track. The client interested in a track also sends a priority attribute in the query to its neighbors. This attribute signifies the urgency of the request and can take on three different values (currently streaming track,prefetching next track, and ofﬂine synchronization).<br />
A serving client/peer sorts these requests by priority and few other parameters and offers the service to top 4 peers.<br />
On receiving the content the client caches it for two reasons 1) it is quite likely that users usually listen to the same song multiple times 2) the cached content can then be used by the client in transmitting the data to other peers in the overlay network. All cached contents are encrypted and are evicted based on an LRU eviction policy.<br />
The same track can be simultaneously downloaded from the server and several different peers. If a peer is too slow in satisfying a request, the request is resent to another peer or, if getting the data has become too urgent, to the server.</p>
<p><strong>Locating tracks on other peers</strong><br />
Instead of a DHT, which is commonly used in most P2P systems, Spotify uses two different mechanisms to locate a track (primarily for performance reasons) -<br />
a) A tracker which is deployed on the back end servers and  b) by querying the immediate neighbors in the overlay<br />
The tracker maintains a mapping from tracks to the peers who have the entire song cached with them. It keeps a list of 20 such peers for each song. It also knows which peers are currently available. A client usually asks the tracker for peers who have a specific song with them currently and the tracker responds by providing details of about 10 such online peers.<br />
The queries sent by a client to its immediate neighbors are forward by them in turn to their immediate neighbors. The neighbors send a response back to the client if they have a song cached with them.</p>
<p><strong>Playback Variations</strong><br />
There are 2 different ways a track gets played back :<br />
1) The random access case where a fresh track is requested for the first time by a user. This constitutes about 39% of the playbacks in Spotify.<br />
2) The predictable track selection case in which a track gets played either because the previous track ended or the user pressed the forward button. This constitutes about 61% of the playbacks in Spotify.</p>
<p><strong>Abstract:</strong><br />
Spotify is a music streaming service offering lowlatency access to a library of over 8 million music tracks.<br />
Streaming is performed by a combination of client-server access and a peer-to-peer protocol. In this paper, we give an overview of the protocol and peer-to-peer architecture used and provide measurements of service performance and user behavior. The service currently has a user base of over 7 million and has been available in six European countries since October 2008. Data collected indicates that the combination of the client-server and peer-to-peer paradigms can be applied to music streaming with good results. In particular, 8.8% of music data played comes from Spotify’s servers while the median playback latency is only 265 ms (including cached tracks). We also discuss the user access patterns observed and how the peer-to-peer network affects the access patterns as they reach the server.</p>
<p><strong>Previewing from <a href="http://www.csc.kth.se/~gkreitz/spotify-p2p10/spotify-p2p10.pdf" target="_blank">http://www.csc.kth.se/~gkreitz/spotify-p2p10/spotify-p2p10.pdf</a></strong><br />
<iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.csc.kth.se%2F~gkreitz%2Fspotify-p2p10%2Fspotify-p2p10.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/low-latency-music/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Tenzing : A SQL Implementation On The MapReduce Framework</title>
		<link>http://www.systemswemake.com/papers/tenzing</link>
		<comments>http://www.systemswemake.com/papers/tenzing#comments</comments>
		<pubDate>Sun, 18 Sep 2011 17:16:02 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Storage]]></category>
		<category><![CDATA[sql over map reduce]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=955</guid>
		<description><![CDATA[This paper which appeared in this year’s VLDB talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across two data centers, with two<span class="readmore-post"><a href="http://www.systemswemake.com/papers/tenzing">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p>This paper which appeared in <a target="_blank" href="http://www.vldb.org/2011/?q=node/31">this year’s VLDB</a> talks about the internals of the SQL query engine atop Google’s Map Reduce framework. Its currently used by over 1000 people in Google serving over 10,000 queries each day that span across two data centers, with two thousand cores each on 1.5 PB of compressed data. The queries claim to have latencies of the order of 10 seconds.<br />
Looks like the motivation for this engine was driven by the need to move away from a proprietary database appliance Google was using as a data warehouse for Google Ads data.</p>
<p><strong>Architecture of the system:</strong><br />
A <strong>distributed worker pool</strong> takes in a query execution plan and executes Map Reduce jobs. The pool consists of master and worker nodes, plus an overall gatekeeper called the master watcher. The workers manipulate the data for all the tables defined in the metadata layer.<br />
A <strong>query server</strong> acts as the first point of interception of a query. It parses the query and then forwards the execution plan to the master node in the worker pool.<br />
The <strong>metadata server</strong> provides an API to store and fetch metadata such as table names and schemas, and pointers to the underlying data. The metadata server is also responsible for storing ACLs (Access Control Lists) and other security related information about the tables. The server uses Bigtable as the persistent backing store.<br />
It also has a few different client interfaces including a CLI and a Web based UI.</p>
<p>The engine supports most of the SQL92 constructs. In addition the query execution engine also embeds <strong>Sawzall language engine</strong> so users can write <strong>Sawzall</strong> functions which can then be invoked from Tenzing. They seem to have spent a lot of effort in enabling efficient joins across heterogeneous data sources. All variations of joins such as inner, left, right, cross, and full outer joins and equi semi-equi, non-equi and function based joins are supported by Tenzing. Also there seems to be support for DDLs, DMLs and logical views over data.</p>
<p>Since the engine is very tightly coupled with the underlying Map Reduce framework some enhancements had to made to the underlying framework primarily to bring down the latency of the queries. Specifically The MapReduce and Tenzing teams collaboratively came up with the pool implementation.</p>
<p><strong>Previewing from <a target="_blank" href="http://research.google.com/pubs/archive/37200.pdf">http://research.google.com/pubs/archive/37200.pdf</a></strong></p>
<p><iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fstatic.googleusercontent.com%2Fexternal_content%2Funtrusted_dlcp%2Fresearch.google.com%2Fen%2F%2Fpubs%2Farchive%2F37200.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/tenzing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HipG: Parallel Processing of Large-Scale Graphs</title>
		<link>http://www.systemswemake.com/papers/hipg</link>
		<comments>http://www.systemswemake.com/papers/hipg#comments</comments>
		<pubDate>Wed, 31 Aug 2011 10:30:04 +0000</pubDate>
		<dc:creator>Hari</dc:creator>
				<category><![CDATA[Distributed Applications]]></category>
		<category><![CDATA[distributed graph processing]]></category>

		<guid isPermaLink="false">http://www.systemswemake.com/?p=940</guid>
		<description><![CDATA[Abstract Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-deﬁned pieces of sequential work<span class="readmore-post"><a href="http://www.systemswemake.com/papers/hipg">Continue reading</a></span>]]></description>
			<content:encoded><![CDATA[<p><strong>Abstract</strong><br />
Distributed processing of real-world graphs is challenging due to their size and the inherent irregular structure of graph computations. We present HipG, a distributed framework that facilitates programming parallel graph algorithms by composing the parallel application automatically from the user-deﬁned pieces of sequential work on graph nodes. To make the user code high-level, the framework provides a uniﬁed interface to executing methods on local and non-local graph nodes and an abstraction of exclusive execution. The graph computations are managed by logical objects called synchronizers, which we used, for example, to implement distributed divide-and-conquer decomposition into strongly connected components. The code written in HipG is independent of a particular graph representation, to the point that the graph can be created on-the-ﬂy, i.e. by the algorithm that computes on this graph, which we used to implement a distributed model checker. HipG programs are in general short and elegant; they achieve good portability, memory utilization, and performance.</p>
<p><strong>Previewing from <a href="http://www.cs.vu.nl/~wanf/pubs/osr11.pdf" target="_blank">http://www.cs.vu.nl/~wanf/pubs/osr11.pdf</a></strong></p>
<p><iframe src="http://docs.google.com/viewer?url=http%3A%2F%2Fwww.cs.vu.nl%2F~wanf%2Fpubs%2Fosr11.pdf&#038;embedded=true" width="600" height="780" style="border: none;"></iframe></p>
]]></content:encoded>
			<wfw:commentRss>http://www.systemswemake.com/papers/hipg/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

