Seine-based MxN Parallel Data Redistribution

 
overview     project     publication    links     people    back 

Overview

Computational method uses large-scale, knowledge-based, data-driven, adaptive and interactive simulations to enable accurate solutions to realistic models of complex phenomena. Usually this type of applications is computational intensive and already highly complex by itself. Recent investigation into multi-physics problem, which couples multiple physics problem together, makes the problem even harder since now we not only have to deal with multiple physics models simultaneously, but also to provide the support for the interactions among multiple physics models. To track down the complexity of those high performance computing applications, CCA (Common Component Architecture) group has proposed the concept of software component, a software object that (i) interact with other components (ii) encapsulating certain functionality or a set of functionalities (iii) have a clearly defined interface (iv) conforms to a prescribed behavior common to all components within an architecture (v) may be composed to build other components. Based on this methodology, increasingly more software components in scientific computing are being composed together to create new large-scale multidisciplinary simulations.

 When multiple software components are composed together, the new system will need “glue” or coupling mechanism to force the system to be a unified entity. Specially, when components are coupled together, they might need to transfer large amount of computational data from one to another between computational phases. Thus, when different components are running on different number of processes, parallel data sets are transferred back and forth between these components. Our objective is to enable such parallel data redistribution among different components in the CCA context.

High performance software is often SPMD parallel, which leads to the MxN problem: connecting components running on differing numbers of processors. More specifically, the problem is to schedule and support the transfer of data between scientific parallel programs with different numbers of processes on each side. In the CCA context, the problem is formulated as to create an MxN component to support parallel data redistribution among different components.

Parallel data redistribution involves several steps:

(1) Describe parallel data: Parallel data structure, Parallel data layout, Processor layout

(2) Compute communication schedules for data transfer

(3) Actually transfer data elements.

As mentioned, MxN parallel data redistribution involves several steps. To construct MxN functionality based on SEINE framework, since the kernel part, or the 2nd step, is already solved in SEINE, the major task is to adapt to CCA MxN component interface. However, for now we can only attempt to be compliant to CCA specifications due to two facts. First, MxN component has not been finalized and formally published. Second, CCA Data Working Group is still working on definition of capability and interface of Distributed Array Descriptor. Consequently, our current work is based on best knowledge of currently available CCA DAD definition and MxN component specification. Further optimization will take place as the CCA working groups finalize their definitions for DAD and MxN component.

The first step, describing parallel data, will be provided by DAD. To date, CCA Data Working Group has proposed an interface for parallel data descriptor for multidimensional array based on HPF (High Performance Fortran). In our project, we will basically follow the CCA approach and our focus is on the next sub problem. The second step, computing communication schedule, is the core of parallel data redistribution. We will deploy and extend the data sharing functionality provided by SEINE, a geometry-based shared space framework, which was originally proposed to realize a scalable virtual shared space, which adopts geometry-based object retrieval semantics to support extremely dynamic and complex coordination and coordination patterns in parallel scientific simulations. The coupling mechanism in SEINE suffices to solve the problem of computing communication schedule herein. The last step, actually transferring data elements, is straightforward when the previous sub problems are solved. Most efficient communication mechanism available should be chosen to ensure the efficiency of this step, especially considering the huge volume of data usually associated with parallel scientific applications.

Project

Seine Overview

SEINE is a high performance shared information space that builds on the tuple spaces model. Communication entities interact with each other by sharing objects in a logically shared space. However conceptual difference exists between SEINE model and the general tuple space model. In the general tuple space model, tuple space spans entire problem domain, is accessible to all nodes in computing environments, and adopts a generic tuple-matching scheme. In contrast, SEINE defines a dynamic shared space that spans a geometric region, which maps to only a part of the entire problem domain, is accessible to a dynamic subset of nodes from the computing environments. Most relevant feature of SEINE to MxN data redistribution capability is that the object shared in SEINE is geometry-based. Geometry-based object, or geometric object, is different from common objects in that it is always associated with a geometric region and a set of data attached to that region. Examples include a line interval, a rectangular region, a grid mesh, and a cube block. All these objects are associated with geometric presence in a coordinate space. It is the role of SEINE that provides a virtual view of such coordinate space to the application such that it can get/put geometric objects from/to the shared space. Because most scientific simulations are based on physically realistic models that are geometry-based, processes interactions in these applications are usually determined by the geometric relationships of their respective computation sub-domains. Consequently, by sharing geometric objects such as the sub-domain or boarders of the sub-domain in SEINE, the application can avoid the task of building the complicated interaction patterns.

SEINE provides a consistent view of the geometry-based shared space to the application, which indicates internally it needs to know of the relationship of geometric objects defined on various regions in the space. To put it in a more intuitive way, when an object A defined on region regA is inserted into the shared space, SEINE needs to update other objects in the shared space whose regions overlap with regA. The updated region is the common area covered by both the object and the newly inserted object. Thus communication schedule needs to be computed within SEINE to reflect the data update to that commonly shared geometric region. When we replace the multidimensional coordinate space with the index space of multidimensional array, MxN data redistribution can be achieved by SEINE. The idea is that instead of sharing on shared geometric regions in coordinate space, the sharing is on shared index regions in the multidimensional index space of data arrays. Each process in the sender and receiver sides is associated with sub-arrays with corresponding index regions from the global data set. These regions should be registered with SEINE so that data mapping from the sender processes to the receiver processes can be calculated by SEINE. The data mapping essentially decides the communication schedule based on which data redistribution will be preceded.

Communication Schedule Computation in SEINE 

Communication schedule in parallel computing context refers to the sequence of message passing required to correctly move data among a set of cooperating processes. In order to correctly move data around, data mapping between the cooperating processes has to be created. Similar to InterComm, SEINE uses linearization of multidimensional index space to an intermediate 1-dimensional index space to create the mapping between source and destination of data redistribution sides. However, other than using a straightforward linearization mechanism as in InterComm, SEINE adopts a more refined scheme, Hilbert SFC, to form the linearization. 

A space-filling curve (SFC) is a continuous mapping from a d-dimensional space to a 1-dimensional space. The d-dimensional space is viewed as a d-dimensional cube, which is mapped onto a line such that the line passes once through each point in the volume of the cube, entering and exiting the cube only once. Using this mapping, a point in the cube can be described by its spatial or d-dimensional coordinates, or by the length along the 1-dimensional index measured from one of its ends. The construction of SFCs is recursive. An important property of SFCs is locality preserving. Points that are close together in the 1-dimensional space are mapped from points that are close together in the d-dimensional space. However, note that its reverse is not true. Not all adjacent sub-cubes in the d-D space are adjacent or even close on the curve. A continuous sub-cube in d-D space will typically be mapped to a collection of 1-D segments on the SFC. These segments are called clusters. The degree to which locality is preserved by a particular SFC is defined by the number of clusters that it maps an arbitrary region in the d-D space to. Hilbert SFC is commonly acknowledged as the SFC that achieves the best clustering. Clustering property is especially desired in computing communication schedules because given such locality preserving property, the data that are close together in the n-D index space have higher probability of being together in the 1-D linearization space, which infers that registration requests need to travel less nodes to find all the overlapping, registered intervals. The data mapping creation process is discussed. Given the above facts, we chose Hilbert HFC to map multidimensional index space to 1-dimensional index space.  

Given the Hilbert SFC-based linearization, SEINE uses interval tree to find out the relationship between geometric regions. When there is overlap between geometric region A and B, SEINE will keep record for such relationship, which is used later for updating data in objects associated with region B when objects associated with region A is inserted into SEINE. Similarly, when applied to MxN redistribution, the correspondence between registered array index regions can be computed based on the linearization and interval tree and thus communication schedule is created.

SEINE Encapsulation into CCA MxN Component

Fig. 1 left part gives a conceptual view of the MxN data redistribution working mechanism and its right shows how MxN data redistribution can be implemented via SEINE. In our proposed approach, basically data objects to be redistributed to different group of processors are inserted into SEINE (high performance shared information space) as shared objects. To share an object in SEINE, the objects need to first be registered with SEINE by each process at both “sender” and “receiver” side. The data mapping relationship can be found out based on the information from the registration. In CCA terms, Communication schedule is available after registration. The sender can then make connection and request transfer of the parallel data object. 

One thing worth noting here is that although SEINE is a shared information space, the way it is designed makes it a good matching for implementing MxN data redistribution. The basic functionality required by MxN is already provided in SEINE. The real task is to make wrap SEINE into CCA compliant component and conform to MxN component interface. However, there does exist a major task in the to-do list. As stated in previous sections, SEINE is a geometry-based shared space model. Its interface requires explicit coordinates from a coordinate space system, which is not always available in parallel applications. We need to span the gap between object registration supported by SEINE and the one required by CCA MxN. The main obstacle herein is that distributed data descriptor is not available yet from CCA data working group. Consequently, we will utilize the best knowledge of available information to adapt our implementation to currently available draft version of Distributed Array Descriptor (DAD) and MxN Component specification. They are, however, not final and will be updated as CCA Working group finalize their definitions for DAD and MxN. Following we will briefly present the draft DAD proposal currently available from CCA Data Working Group. 

             

Fig.1. CCA MxN parallel data redistribution and its implementation via SEINE

Parallel Data Representation*

[*] From CCA MxN group meeting. Detailed information regarding parallel data description can be found here.

CCA Data Working Group has been working on a complete definition of the capabilities and interfaces of Distributed Array Descriptor (DAD). Existing MxN projects and HPF counterparts in Distributed Array Descriptions give certain inspiration to the definition. The draft proposal supports distributed multidimensional rectangular arrays. Mechanism to describe the layouts and a mechanism to link the actual data to the layout will be provided. Herein, layout specifies the number of dimensions of the distributed array, as well as the lower and upper bounds of each of the dimensions. In the draft proposal for DAD, five distribution types are defined, including

Collapsed, the case when the entire dimension is held in the same process;

Block, the case of regular block distribution (one block per process) and cyclic block distributions when the block size is smaller than the dimension divided by the number of processes;

GenBlock (Fig.2), the case when blocks are of arbitrary sizes on each process and it limits one block per process.

Fig. 2 GenBlock distribution

Implicit, the case of an arbitrary mapping of elements to processes;

Explicit, the case when the data distribution is completely user specified. This case cannot be combined with all the other types of representations. 

Once the template is specified, the application can build data objects. When creating the data object, the application must specify the template to which it is associated, the data type, each process position in the process topology, and the alignment of the data object within the template. Also alignment needs to be specified to indicate how the actual data object relates to the reference template.

MxN Component Interface

In the draft MxN component specification, data redistribution involves several steps, i.e., register data, create communication schedule, make connection, request transfer, and data ready to indicate the data are ready to be transferred. When the MxN functionality is implemented on SEINE framework, the key steps are register data, which passes the descriptor that specifies the data distribution information to the SEINE-based MxN component, and request transfer or data ready, which actually initiate the data transfer. Fig.3 illustrates how SEINE-based MxN component interacts with other component to and provides redistribution functionality support to them.

Implementation

Scenario I: data redistribution in single direct-connected framework (Ccaffeine framework)

Fig. 3 MxN data redistribution based on SEINE framework

Scenario II: data redistribution between multiple direct-connected frameworks (Ccaffeine framework)

Scenario III: data redistribution in single distributed framework (XCAT framework)

Preliminary Experimental Evaluation

Still under construction ...

Publication

Still under construction ....


Links

MxN Parallel Data Redistribution @ ORNL 

MxN Parallel Data Redistribution @ Extreme Lab, Indiana University

Jay Larson's Modelling Coupling Toolkit (a set of software tools for coupling message-passing parallel models to create a parallel coupled model)

InterComm @ University of Maryland (a runtime library that achieves direct data transfers between data structures managed by multiple data parallel languages and libraries in different programs. Such programs include those that directly use a low-level message-passing library, such as MPI)

SCIRun @ University of Utah (a multifunctional problem solving environment that can be best described as a computational workbench by which the user can "close the loop." All aspects of the modeling, simulation, and visualization processes are linked, controlled graphically within the context of a single application program)

The PAWS Project (Parallel Application Workspace)

The CUMULVS Project (Collaborative User Migration, User Library for Visualization and Steering)

CCA-Forum


People

Dr. Manish Parashar               Professor

Li Zhang                                    Graduate Student


Acknowledgement

We would like to thank Dr. Dennis Gannon and the XCAT project team for their help with the XCAT framework environment setup, and thank Dr. James Kohl for updating the MxN parallel data redistribution component interface definition.

 

Back