next up previous
Next: 3.2 Static HPA Up: 3 Hierarchical Partitioning Algorithm Previous: 3 Hierarchical Partitioning Algorithm

3.1 General HPA

The overall efficiency of parallel and distributed SAMR applications is limited by the ability to partition the underlying grid hierarchies at runtime to expose all inherent parallelism, minimize communication and synchronization overheads, and balance load. In most distributed implementations of SAMR [9,10,15], load scheduling and balancing is collectively done by all processors in the system and all processors maintain a global knowledge of the total workload. These schemes have the advantage of a better load balance. However these approaches require the collection and maintenance of global load information which makes them expensive, specially on large system. Partitioning in these Non-HPA schemes 2 consists of the following steps:
  1. [$\bullet$] Global load information exchange and synchronization phase: All processors are engaged in this information exchange phase. After this phase, all the processors have a global view of the grid hierarchy.
  2. [$\bullet$] Load partitioning phase: All processors calculate the average load per processor and partition the grid hierarchy. This operation is done by each processor in the system.
The sequence of steps taking place in the Non-HPA scheme for partitioning and scheduling ghost communications is illustrated in the sequence diagram in Figure 2.

Figure 2: Sequence diagram of the Non-HPA scheme
\begin{figure}
\centering\epsfig{file=figures/oldsequence.eps, scale=0.6,
clip=, bb= 110 300 480 710}
\end{figure}

Initially, all processors have the initial computational domain. Each processor partitions the domain into subdomains and assigns a subdomain to itself. During the load balancing phase, all processors synchronize and exchange their local domain information. At the end of this phase, every processor has a consistent global view of the domain. The partitioning algorithm then partitions the domain among the processors. After partitioning is complete, the processors migrate data that no longer belongs to their local subdomains. Each processor then schedules ghost communications based on its new local subdomain. In large parallel/distributed systems, the global information exchange and synchronization phase becomes a performance bottleneck. The HPA scheme presented in this paper does not propose a new partitioner, but a hierarchical partitioning strategy. The underlying partitioning scheme adopted is the composite decomposition method using space filling curve (SFC) technique [11,13] as mentioned in Section 2. In this scheme, partitioning at different level is performed in parallel based on load information local to that level. Load is periodically propagated up the processor group hierarchy in an incremental manner. Furthermore communications are conducted in stages among the processors in a group hierarchically, rather than requiring communication and synchronization among all processors. This is achieved by dividing the processors into processor/compute groups as shown in Figure 3.

Figure 3: A general hierarchical structure of processor groups
\begin{figure}
\centering\epsfig{file=figures/fig_group.eps, scale=0.4, clip=,
bb= 35 300 560 700}
\end{figure}

Figure 3 illustrates a general hierarchical tree structure of processor groups, where, $G_0$ is the root level group (group level=0) and consists of all the processors. $G_i$ is the $i-th$ group at the group level 1. Note that the processors form the leaves of the tree. The communication between processors is conducted through their closest common ancestor group which is their coordinater or master. For example, processors $P_{10}$ and $P_{14}$ have common ancestor groups $G_0$, $G_2$ and $G_{2,2}$. However their closest common ancestor group is $G_{2,2}$. Consequently their communication is via the group $G_{2,2}$ which is their coordinator or master. Similarly, communications between processors $P_0$ and $P_{10}$ are via the group $G_0$. In HPA, the partitioning phase is divided into sub-phases as follows.
  1. [$\bullet$] Local partitioning phase: The processors belonging to a processor group partition the group load based on a local load threshold and assign a portion to each processor within the group . Parent groups perform the partitioning among their children groups in a hierarchical manner.
  2. [$\bullet$] Global partitioning phase: The root group coordinator (group level 0) decides if a global repartitioning has to be performed among its children groups at the group level 1 according to the group threshold.
The pseudo-code for the new load balancing phase is given in Table 2.

Table 2: Load balancing phase in the general HPA
/* in the highest group composed of individual processors */
if(my_load > threshold) {
    do a local partition in a group;
}
grouplevel = highest level;
while(grouplevel > 0){
    if(group_load > group_threshold) {
        do a partition among children groups at grouplevel;
        broadcast new composite list through parent group;
    }
    grouplevel --;
}
begin computation; ...

The HPA scheme attempts to exploit the fact that given a group with adequate number of processors, and a carefully defined number of groups, the number of global partitioning phases can be minimized. The working step of the general HPA is illustrated by the sequence diagram in Figure 4.

Figure 4: Sequence diagram of the HPA scheme
\begin{figure}
\centering\epsfig{file=figures/newsequence.eps, scale=0.6,
clip=, bb= 120 370 460 725}
\end{figure}

In this figure, we show a two level group hierarchy including the root group $G_0$. The hierarchical scheme first creates processor groups. After these groups are created and the initial grid hierarchy is setup, the group coordinators/masters partition the initial domain in the global partitioning phase. At the end of this phase the coordinators have a portion of the domain that is then partitioned among the processors in the group subtrees. Recursively, portions of the computational domain are partitioned further and finally assigned to individual processors at the leaves of the processor group hierarchy. This is the local partitioning phase.
next up previous
Next: 3.2 Static HPA Up: 3 Hierarchical Partitioning Algorithm Previous: 3 Hierarchical Partitioning Algorithm
Xiaolin Li 2002-06-15