
Computational method uses large-scale, knowledge-based,
data-driven, adaptive and interactive simulations to enable accurate solutions
to realistic models of complex phenomena. Usually this type of applications is
computational intensive and already highly complex by itself. Recent
investigation into multi-physics problem, which couples multiple physics problem
together, makes the problem even harder since now we not only have to deal with
multiple physics models simultaneously, but also to provide the support for the
interactions among multiple physics models. To track down the complexity of
those high performance computing applications, CCA (Common Component
Architecture) group has proposed the concept of software component, a
software object that (i) interact with other components (ii) encapsulating
certain functionality or a set of functionalities (iii) have a clearly defined
interface (iv) conforms to a prescribed behavior common to all components within
an architecture (v) may be composed to build other components. Based on this
methodology, increasingly more software components in scientific computing are
being composed together to create new large-scale multidisciplinary simulations.
When multiple software components are composed together, the new system will need “glue” or coupling mechanism to force the system to be a unified entity. Specially, when components are coupled together, they might need to transfer large amount of computational data from one to another between computational phases. Thus, when different components are running on different number of processes, parallel data sets are transferred back and forth between these components. Our objective is to enable such parallel data redistribution among different components in the CCA context.
High performance software is often SPMD parallel, which leads to the MxN problem:
connecting components running on differing numbers of processors. More
specifically, the problem is to schedule and support the transfer of data
between scientific parallel programs with different numbers of processes on each
side. In the CCA context, the problem is formulated as to create an MxN
component to support parallel data redistribution among different components.
Parallel data redistribution involves several steps:
(1) Describe parallel data: Parallel data structure, Parallel data layout, Processor layout
(2) Compute communication schedules for data transfer
(3) Actually transfer data elements.
The first step, describing parallel data, will be provided by DAD. To date, CCA Data Working Group has proposed an interface for parallel data descriptor for multidimensional array based on HPF (High Performance Fortran). In our project, we will basically follow the CCA approach and our focus is on the next sub problem. The second step, computing communication schedule, is the core of parallel data redistribution. We will deploy and extend the data sharing functionality provided by SEINE, a geometry-based shared space framework, which was originally proposed to realize a scalable virtual shared space, which adopts geometry-based object retrieval semantics to support extremely dynamic and complex coordination and coordination patterns in parallel scientific simulations. The coupling mechanism in SEINE suffices to solve the problem of computing communication schedule herein. The last step, actually transferring data elements, is straightforward when the previous sub problems are solved. Most efficient communication mechanism available should be chosen to ensure the efficiency of this step, especially considering the huge volume of data usually associated with parallel scientific applications.
SEINE is a high
performance shared information space that builds on the tuple spaces model.
Communication entities interact with each other by sharing objects in a
logically shared space. However conceptual difference exists between SEINE model
and the general tuple space model. In the general tuple space model, tuple space
spans entire problem domain, is accessible to all nodes in computing
environments, and adopts a generic tuple-matching scheme. In contrast, SEINE
defines a dynamic shared space that spans a geometric region, which maps to only
a part of the entire problem domain, is accessible to a dynamic subset of nodes
from the computing environments. Most relevant feature of SEINE to MxN data
redistribution capability is that the object shared in SEINE is geometry-based.
Geometry-based object, or geometric object, is different from common objects in
that it is always associated with a geometric region and a set of data attached
to that region. Examples include a line interval, a rectangular region, a grid
mesh, and a cube block. All these objects are associated with geometric presence
in a coordinate space. It is the role of SEINE that provides a virtual view of
such coordinate space to the application such that it can get/put geometric
objects from/to the shared space. Because most scientific simulations are based
on physically realistic models that are geometry-based, processes interactions
in these applications are usually determined by the geometric relationships of
their respective computation sub-domains. Consequently, by sharing geometric
objects such as the sub-domain or boarders of the sub-domain in SEINE, the
application can avoid the task of building the complicated interaction patterns.
SEINE provides a consistent view of the geometry-based shared space to the application, which indicates internally it needs to know of the relationship of geometric objects defined on various regions in the space. To put it in a more intuitive way, when an object A defined on region regA is inserted into the shared space, SEINE needs to update other objects in the shared space whose regions overlap with regA. The updated region is the common area covered by both the object and the newly inserted object. Thus communication schedule needs to be computed within SEINE to reflect the data update to that commonly shared geometric region. When we replace the multidimensional coordinate space with the index space of multidimensional array, MxN data redistribution can be achieved by SEINE. The idea is that instead of sharing on shared geometric regions in coordinate space, the sharing is on shared index regions in the multidimensional index space of data arrays. Each process in the sender and receiver sides is associated with sub-arrays with corresponding index regions from the global data set. These regions should be registered with SEINE so that data mapping from the sender processes to the receiver processes can be calculated by SEINE. The data mapping essentially decides the communication schedule based on which data redistribution will be preceded.
Communication
schedule in parallel computing context refers to the sequence of message passing
required to correctly move data among a set of cooperating processes. In order
to correctly move data around, data mapping between the cooperating processes
has to be created. Similar to InterComm, SEINE uses linearization of
multidimensional index space to an intermediate 1-dimensional index space to
create the mapping between source and destination of data redistribution sides.
However, other than using a straightforward linearization mechanism as in
InterComm, SEINE adopts a more refined scheme, Hilbert SFC, to form the
linearization.
A space-filling curve (SFC) is a continuous mapping
from a d-dimensional space to a 1-dimensional space. The d-dimensional space is
viewed as a d-dimensional cube, which is mapped onto a line such that the line
passes once through each point in the volume of the cube, entering and exiting
the cube only once. Using this mapping, a point in the cube can be described by
its spatial or d-dimensional coordinates, or by the length along the
1-dimensional index measured from one of its ends. The construction of SFCs is
recursive. An important property of SFCs is locality preserving. Points that are
close together in the 1-dimensional space are mapped from points that are close
together in the d-dimensional space.
However, note that its reverse is not true. Not all
adjacent sub-cubes in the d-D space are adjacent or even close on the curve. A
continuous sub-cube in d-D space will typically be mapped to a collection of 1-D
segments on the SFC. These segments are called clusters. The degree to which
locality is preserved by a particular SFC is defined by the number of clusters
that it maps an arbitrary region in the d-D space to. Hilbert SFC is commonly
acknowledged as the SFC that achieves the best clustering. Clustering property
is especially desired in computing communication schedules because given such
locality preserving property, the data that are close together in the n-D index
space have higher probability of being together in the 1-D linearization space,
which infers that registration requests need to travel less nodes to find all
the overlapping, registered intervals. The data mapping creation process is
discussed. Given the above facts, we chose Hilbert HFC to map multidimensional
index space to 1-dimensional index space.
Given the Hilbert SFC-based linearization, SEINE uses interval tree to find out the relationship between geometric regions. When there is overlap between geometric region A and B, SEINE will keep record for such relationship, which is used later for updating data in objects associated with region B when objects associated with region A is inserted into SEINE. Similarly, when applied to MxN redistribution, the correspondence between registered array index regions can be computed based on the linearization and interval tree and thus communication schedule is created.
Fig. 1 left part gives a conceptual view of the MxN data
redistribution working mechanism and its right shows how MxN data redistribution
can be implemented via SEINE. In our proposed approach, basically data objects
to be redistributed to different group of processors are inserted into SEINE
(high performance shared information space) as shared objects. To share an
object in SEINE, the objects need to first be registered with SEINE by each
process at both “sender” and “receiver” side. The data mapping
relationship can be found out based on the information from the registration. In
CCA terms, Communication schedule is available after registration. The sender
can then make connection and request transfer of the parallel data object.
One thing worth noting here is that although SEINE is a shared information space, the way it is designed makes it a good matching for implementing MxN data redistribution. The basic functionality required by MxN is already provided in SEINE. The real task is to make wrap SEINE into CCA compliant component and conform to MxN component interface. However, there does exist a major task in the to-do list. As stated in previous sections, SEINE is a geometry-based shared space model. Its interface requires explicit coordinates from a coordinate space system, which is not always available in parallel applications. We need to span the gap between object registration supported by SEINE and the one required by CCA MxN. The main obstacle herein is that distributed data descriptor is not available yet from CCA data working group. Consequently, we will utilize the best knowledge of available information to adapt our implementation to currently available draft version of Distributed Array Descriptor (DAD) and MxN Component specification. They are, however, not final and will be updated as CCA Working group finalize their definitions for DAD and MxN. Following we will briefly present the draft DAD proposal currently available from CCA Data Working Group.

Fig.1. CCA MxN parallel data redistribution and its implementation via SEINE
[*] From CCA MxN group meeting. Detailed information regarding parallel data description can be found here.
CCA Data Working Group has been working on a complete definition of the capabilities and interfaces of Distributed Array Descriptor (DAD). Existing MxN projects and HPF counterparts in Distributed Array Descriptions give certain inspiration to the definition. The draft proposal supports distributed multidimensional rectangular arrays. Mechanism to describe the layouts and a mechanism to link the actual data to the layout will be provided. Herein, layout specifies the number of dimensions of the distributed array, as well as the lower and upper bounds of each of the dimensions. In the draft proposal for DAD, five distribution types are defined, including
Collapsed, the case when the entire dimension is held in the same process;
Block, the case of regular block distribution (one block per process) and cyclic block distributions when the block size is smaller than the dimension divided by the number of processes;
GenBlock (Fig.2), the case when blocks are of arbitrary sizes on each process and it limits one block per process.

Fig. 2 GenBlock distribution
Implicit, the case of an arbitrary mapping of elements to processes;
Explicit,
the case when the data distribution is completely user specified. This case
cannot be combined with all the other types of representations.
Once the template is specified, the application can build data objects. When creating the data object, the application must specify the template to which it is associated, the data type, each process position in the process topology, and the alignment of the data object within the template. Also alignment needs to be specified to indicate how the actual data object relates to the reference template.
MxN Component Interface
In the draft MxN component specification, data redistribution involves several steps, i.e., register data, create communication schedule, make connection, request transfer, and data ready to indicate the data are ready to be transferred. When the MxN functionality is implemented on SEINE framework, the key steps are register data, which passes the descriptor that specifies the data distribution information to the SEINE-based MxN component, and request transfer or data ready, which actually initiate the data transfer. Fig.3 illustrates how SEINE-based MxN component interacts with other component to and provides redistribution functionality support to them.

Fig. 3 MxN data redistribution based on SEINE framework
Still under construction ...
Still under construction ....
MxN Parallel Data Redistribution @ ORNL
MxN Parallel Data Redistribution @ Extreme Lab, Indiana University
Jay Larson's Modelling Coupling Toolkit (a set of software tools for coupling message-passing parallel models to create a parallel coupled model)
InterComm @ University of Maryland (a runtime library that achieves direct data transfers between data structures managed by multiple data parallel languages and libraries in different programs. Such programs include those that directly use a low-level message-passing library, such as MPI)
SCIRun @ University of Utah (a multifunctional problem solving environment that can be best described as a computational workbench by which the user can "close the loop." All aspects of the modeling, simulation, and visualization processes are linked, controlled graphically within the context of a single application program)
The PAWS Project (Parallel Application Workspace)
The CUMULVS Project (Collaborative User Migration, User Library for Visualization and Steering)
Dr. Manish Parashar Professor
Li Zhang Graduate Student
We would like to thank Dr. Dennis Gannon and the XCAT project team for their help with the XCAT framework environment setup, and thank Dr. James Kohl for updating the MxN parallel data redistribution component interface definition.