Multithreaded Computational Engine
Objective:
The objective of this project is to develop a multithreaded communication
engine to support the GrACE infrastructure for AMR applications based on adaptive
grid hierarchies. The overall goal is to improve performance by overlapping
computations on individual grid blocks with associated (inter-level and intra-level)
communications.
Motivation:
Dynamically adaptive methods for the solution of partial
differential equations that employ locally optimal approximations can yield
highly advantageous ratios for cost/accuracy when compared to methods based
upon static uniform approximations. These techniques seek to improve the
accuracy of the solution by dynamically refining the computational grid
in regions of high local solution error.
Distributed implementations of these adaptive methods
offer the potential for the accurate solution of realistic models of important
physical systems. These implementations however, lead to interesting challenges
in dynamic resource allocation, data-distribution and load balancing, communications
and coordination, and resource management. The overall efficiency of the
algorithms is limited by the ability to partition the underlying data-structures
at run-time to expose all inherent parallelism, minimize communication/synchronization
overheads, and balance load. This motivates the need for an efficient communication
engine to minimize communication and synchronization overheads.
Performance analysis of AMR applications showed that
in certain cases, up to 50% of the total execution time can be spent in
synchronizing ghost regions between grid blocks at different levels of
the grid hierarchies. This limited the overall scalability of these applications.
This led us to the conclusion that to improve performance, synchronization
time has to minimised by exploiting inherent parallelism available during
computation on blocks. This can be done by incorporating multithreading
into the library.
Approach:
The multithreaded engine enables overlap between the
computations and communications on grid blocks owned by a processor. As
the processor cycles through its grid blocks sequentially, communication
of ghost regions of completed blocks are scheduled concurrently. Our multithread
engine consists of two classes of threads: synchronization threads and
computation threads. Synchronization threads are normally dormant
and are activated only when communication is required. Computation threads
are responsible for all management and computational tasks (i.e. for setting
up the grid hierarchy, for data and storage management and storage and
load balancing). The key motivation for such a simple models is that only
one thread interacts with MPI at any time and so the implementation does
not depend on thread-safe MPI which is not available on all platforms.
The operation of the threaded engine is as follows:
Ghost Synchronizations: The multithreaded engine guarantees the semantics of a computational loop followed by ghost synchronization where the all the ghost regions are updated on return form the synchronization call. When the computational loop is initiated (e.g. GrACE forall loop), the send thread and receive thread are signaled. All communication tasks are then offloaded to these two threads. At the end of the computational loop, all block communications are guaranteed to be complete.
Redistribution: During redistribution, the grid data is communicated by the communication thread using the signaling mechanisms and mutexes available to us in the thread libraries. The computation thread signals the send and receive threads once computation has begun on the blocks. The send thread, receive thread and computation threads synchronize using condition variables and send and receive queues.

Status:
The multithreaded communication engine is being developed
using the POSIX pthreads library to enable easy porting of this communication
engine to different operating systems. Current implementation and experimentation
is being done on Sun E10K machines that have the Sun HPC 5.0 installed
on them.