Autonomic Runtime Management of Dynamic Applications


Introduction/Motivation

 

The fundamental goal of autonomic computing is to reduce the ever-increasing complexity in the management of large-scale distributed computing systems. The dynamic and adaptive nature of high performance scientific computing, especially the highly dynamic structured adaptive mesh refinement (SAMR) applications, demands the orchestrated runtime management of the availability, capability, heterogeneity and dynamics of computing, communication and storage resources to adaptively cater to the runtime requirement of SAMR applications. The self-configuration, self-optimization and self-healing characteristics of autonomic computing match well with the runtime management requirements of SAMR applications by intelligently mapping the adaptivity and dynamics of systems and applications.

 

Dynamically adaptive SAMR methods for the solution of partial differential equations that employ locally optimal approximations can yield highly advantageous ratios for cost/accuracy when compared to methods based upon static uniform approximations. These techniques seek to improve the accuracy of the solution by dynamically refining the computational grid in regions of high local solution error. Distributed implementations of these adaptive methods offer the potential for the accurate solution of realistic models of important physical systems. These implementations, however, lead to interesting challenges in dynamic resource allocation, data-distribution and load balancing, communications and coordination, and resource management. The overall efficiency of the algorithms is limited by the ability to partition the underlying data-structures at run-time so as to expose all inherent parallelism, minimize communication/synchronization overheads, and balance load. A critical requirement while partitioning adaptive grid hierarchies is the maintenance of logical locality, both across different levels of the hierarchy under expansion and contraction of the adaptive grid structure and within partitions of grids at all levels when they are decomposed and mapped across processors. The former enables efficient computational access to the grids while the latter minimizes the total communication and synchronization overheads. Furthermore, application adaptivity results in application grids being created, moved and deleted on-the-fly, making it necessary to efficiently re-partition the hierarchy so that it continues to meet these goals.

 

Moving these applications to dynamic and heterogeneous distributed computing environments (e.g. the Grid) introduces a new level of complexity. These environments require selecting and configuring application components based on available resources. However, the complexity and heterogeneity of the environment make the selection of a best matching between system resources, application algorithms, problem decompositions, mappings and load distributions, communication mechanisms, etc., non-trivial. System dynamics coupled with application adaptation makes application configuration and run-time management a significant challenge. Clearly, there is a need for smart and autonomic self-configuration and self-management at runtime for SAMR applications on the Grid platform.

 

Autonomic Runtime Management of SAMR Applications

·       Autonomic Runtime Management of SAMR Applications: Problem Description

·        Autonomic Runtime Management of SAMR Applications: Overview

·        Autonomic Runtime Management of SAMR Applications: Workflow

·        Monitoring and Characterization

·        Deduction and Optimization

·        Spatial-Temporal Scheduling

Related Projects

·        ARMaDA: Adaptive Runtime Management of Dynamic Applications

·        GrACE: Grid Adaptive Computational Engine