DIOS++ (Rule-based Distributed Interactive Object System)


Motivation and introduction

High-performance simulations for physical phenomena and mathematical problems executing on distributed, heterogeneous and dynamic Grid environments are playing an increasingly critical role in science and engineering. As the size, dynamics, complexity and costs of these simulations grow, it becomes more and more important to be able to monitor, control, adapt and optimize a simulation application’s execution at runtime based on its state and the state of the computational environment. Experts should be able to define and deploy rules to enable the running simulation to be automatically monitored, to respond to specific conditions in its execution, and invoke appropriate operations on the expert’s behalf, so as to make those simulations self-healing, self-managed and self-optimized.

DIOS++, which forms the back-end of DISCOVER, is built based on DIOS. DIOS++ enables rule-based autonomic management and optimization of distributed and parallel applications. It provides abstractions for enhancing existing application objects with sensors and actuators for interrogation, a control network that connects and manages the distributed sensors and actuators, and also enables external discovery, interrogation, monitoring and manipulation of these objects at runtime, and a distributed rule engine that enables the runtime definition, deployment and execution of rules for adapting application objects.

Autonomic object

An autonomic object enhances an computational object (data-structures, algorithms) with 3 aspects and an embedded rule agent:

The rule agent will be discussed in section control network.

Rule

In the DIOS++ framework, rules are separated from application logic. It provides the flexibility which allows users to create, delete and change rules dynamically without modifying application source code. Users use these rules to monitor and control their applications at run time. Rules can be added, deleted, changed on the fly without stopping and restarting the application. Rules are handled by rule agents and the rule engine, which are part of the control network (described in the following subsection c) and are responsible for storing, evaluating and executing rules.

Rule has the format of "IF condition expression THEN action list ELSE action list". The condition expression and action list consists of sensors & actuators exposed by applications and system.

Control network

The DIOS++ control network is a hierarchical structure consisting of rule engine, gateway,  and autonomic objects. It is automatically configured at run time using the underlying messaging environment (e.g. MPI) and the available processors.

The Gateway represents a management proxy for the entire application. It maintains and manages a registry of the interaction interfaces (sensors and actuators) for all the autonomic objects in the application.  The Gateway interacts with external interaction servers or brokers such as those provided by Discover, and dispatches incoming requests to corresponding autonomic objects.

Co-located with Gateway, the rule engine accepts and maintains the rules for the application. It decomposes these rules and distributes them to corresponding rule agents, coordinates the execution of rule agents, and reports rule execution results to the users. Rules are evaluated and executed by rule agents distributed and in parallel. The personalities (e.g. rule evaluation sequence, lifecycle etc.) of a rule agent is specified by the script, which is defined by the rule engine at runtime.

In DIOS++, although rule execution is coordinated by the rule engine, rules are evaluated and executed in parallel. This central-control and distributed-execution mechanism has the following advantages: (1) Rule execution which can be compute-intensive is done in parallel by rule agents. This reduces the rule execution time as compared to a sequential rule execution. (2) Rule agents are created dynamically and delegated to autonomic objects. This solution requires less system resources than static rule agents as the agents are created only when need. It also leads to more efficient rule execution. (3) Rule agent’s behavior is based on script, which allows it to adapt to the execution environment and the rules that it needs to execute. Rule agent scripts can be calibrated at runtime by the rule engine to make rule agents more adaptive.

While typical rule execution is straightforward (actions are issued when their required conditions are fulfilled), the application dynamics and user interactions make
things unpredictable. As a result, rule conflicts must be detected at runtime. In DIOS++, rule conflicts are detected at runtime and are handled by simply disabling the conflicting rules with lower priorities. This is done by locking the required sensors/actuators.

Experiment results

DIOS++ has been implemented as a C++ library. This section summarizes an experimental evaluation of the DIOS library using the IPARS reservoir simulator framework
on a 32 node beowulf cluster. IPARS is a Fortran-based framework for developing parallel/distributed reservoir simulators. Using DIOS++/Discover, engineers can interactively feed in parameters such as water/gas injection rates and well bottom hole pressure, and observe the water/oil ratio or the oil production rate. The evaluation consists of 2 experiments:


*For more information, please refer to the paper "DIOS++: A Framework for Rule-Based Autonomic Management of Distributed Scientific Applications"[pdf].

Author: Hua Liu