ZHEN LI
Ph.D. Candidate
 
     
  Home > Research > Comet
 

CometG: A Decentralized Computational Infrastructure for Grid-based Parallel Asynchronous Iterative Applications

Overview:

Comet is a scalable content-based coordination space for wide-area P2P environments. It provides a global virtual shared-space that can be associatively accessed by all peers in the system, and access is independent of the physical location of the tuples or identifiers of the host. The Comet coordination model is based on a global virtual shared-space constructed from a semantic information space. The information space is deterministically mapped, using a locality preserving mapping, to a dynamic set of peer nodes in the system. The resulting peer-to-peer information lookup system maintains content locality and guarantees that content-based information queries, using flexible content descriptors in the form of keywords, partial keywords and wildcards, are delivered with bounded cost. Using this substrate, the space can be associatively accessed by all system peers without requiring the location information of tuples and host identifiers. The Comet provides transient spaces that enable the applications can explicitly exploit context locality. The current prototype of Comet builds on the JXTA peer-to-peer framework and is deployed on PlanetLab.

Architecture:

Comet is composed of layered abstractions prompted by a fundamental separation of communication and coordination concerns. A schematic overview of the system architecture is shown in below. The communication layer provides scalable content-based messaging and manages system heterogeneity and dynamism. This layer essentially maps the virtual information space in a deterministic way to the dynamic set of currently available peer nodes in the system while maintaining content locality. The coordination layer provides Linda-like associative primitives and supports a shared-space coordination model. This layer can be enhanced with notifications and reactivity.

 

Figure: A schematic overview of the Comet system architecture.

Tuple and tuple distribution:

In Comet, a tuple is a simple XML string. This lightweight format is flexible enough to represent the information for all kinds of applications and has rich matching relationships. Comet employs the Hilbert Space-Filling Curve (SFC) to map tuples from a semantic information space to the linear node index. Each tuple is associated with k keywords selected from its tag and names. They are defined as the keys of the tuple in the k-dimensional (kD) information space.  If the keys of a tuple only include complete keywords, the tuple is mapped as a point in the information space and located on at most one node. If its keys consist of partial keywords, wildcards, or ranges, the tuple identifies a region in the information space, corresponding to a set of points in the index space. Each node stores the keys that map to the segment of the curve between itself and the predecessor node.

 

The communication layer:

The communication layer provides an associative communication service and guarantees that content-based messages, specified using flexible content descriptors, are served with bounded cost. This layer essentially maps the virtual information space in a deterministic way to the dynamic set of currently available peer nodes in the system while maintaining content locality. This layer includes a content-based routing engine Squid  and a structured self-organizing overlay. Squid provides a decentralized information discovery and associative messaging service. It implements the Hilbert SFC mapping to effectively map a multi-dimensional information space to a peer index space and to the current system peer nodes, which form a structured overlay.The overlay is composed of peer nodes, which may be any node in the system (e.g., gateways, access points, message relay nodes, servers, or end-user computers). The peer nodes can join or leave the network at any time. While the Comet architecture is based on a structured overlay, it is not tied to any specific overlay topology.

Figure: Tuple distribution and retrieval using the Out/In/Rd operators.

 

The coordination layer:

The coordination layer provides tuple operation primitives to support shared-space based coordination model and can be enhanced with notifications. This layer defines the following coordination primitives, which retain the Linda semantics, i.e., if multiple matching tuples are found, one of them is arbitrarily returned (and removed). If there is no matching tuple, the operation waits for one to appear. (i) Out(ts, t): an non-blocking operation that inserts the tuple t into space ts. (ii) In(ts,t): a blocking operation that removes a tuple t matching template t from the spacets and returns it. (iii) Rd(ts,t): a blocking operation that returns a tuple t matching template t from the space ts. The tuple is not removed from the space.

The tuple distribution and exact retrieval processes  are illustrated as below.

 

 

Transient spaces:

Comet is naturally suitable for context-transparent applications. To address context-aware applications which require that context locality be maintained in addition to content locality, Comet supports dynamically constructed transient spaces that have a specific scope definition (e.g., within the same geographical region or the same physical subnet). The structure of the transient space is exactly the same as the global space, which is accessible to all peer nodes and acts as the default coordination platform. An application can switch between spaces at runtime and can simultaneously use multiple spaces.