DIOS++ (Rule-based Distributed Interactive Object System)
Motivation and introduction
High-performance simulations for physical phenomena and mathematical problems executing on distributed, heterogeneous and dynamic Grid environments are playing an increasingly critical role in science and engineering. As the size, dynamics, complexity and costs of these simulations grow, it becomes more and more important to be able to monitor, control, adapt and optimize a simulation application’s execution at runtime based on its state and the state of the computational environment. Experts should be able to define and deploy rules to enable the running simulation to be automatically monitored, to respond to specific conditions in its execution, and invoke appropriate operations on the expert’s behalf, so as to make those simulations self-healing, self-managed and self-optimized.
DIOS++, which forms the back-end of DISCOVER, is built based on DIOS. DIOS++ enables rule-based autonomic management and optimization of distributed and parallel applications. It provides abstractions for enhancing existing application objects with sensors and actuators for interrogation, a control network that connects and manages the distributed sensors and actuators, and also enables external discovery, interrogation, monitoring and manipulation of these objects at runtime, and a distributed rule engine that enables the runtime definition, deployment and execution of rules for adapting application objects.
Autonomic object
An autonomic object enhances an computational object (data-structures, algorithms) with 3 aspects and an embedded rule agent:
control
aspect specifies the sensors and actuators exported by an object. The
sensors provide interfaces for viewing the current state of an object, while
the actuators provide interfaces for processing commands to modify the
object’s state.The rule agent will be discussed in section control network.
Rule
In the DIOS++ framework, rules are separated from application logic. It provides the flexibility which allows users to create, delete and change rules dynamically without modifying application source code. Users use these rules to monitor and control their applications at run time. Rules can be added, deleted, changed on the fly without stopping and restarting the application. Rules are handled by rule agents and the rule engine, which are part of the control network (described in the following subsection c) and are responsible for storing, evaluating and executing rules.
Rule has the format of "IF condition expression THEN action list ELSE action list". The condition expression and action list consists of sensors & actuators exposed by applications and system.
Control network
The DIOS++ control network is a hierarchical structure consisting of rule engine, gateway, and autonomic objects. It is automatically configured at run time using the underlying messaging environment (e.g. MPI) and the available processors.
The Gateway represents a management proxy for the entire application. It maintains and manages a registry of the interaction interfaces (sensors and actuators) for all the autonomic objects in the application. The Gateway interacts with external interaction servers or brokers such as those provided by Discover, and dispatches incoming requests to corresponding autonomic objects.
Co-located with Gateway, the rule engine accepts and maintains the rules for the application. It decomposes these rules and distributes them to corresponding rule agents, coordinates the execution of rule agents, and reports rule execution results to the users. Rules are evaluated and executed by rule agents distributed and in parallel. The personalities (e.g. rule evaluation sequence, lifecycle etc.) of a rule agent is specified by the script, which is defined by the rule engine at runtime.
In DIOS++, although rule execution is coordinated by the rule engine, rules are evaluated and executed in parallel. This central-control and distributed-execution mechanism has the following advantages: (1) Rule execution which can be compute-intensive is done in parallel by rule agents. This reduces the rule execution time as compared to a sequential rule execution. (2) Rule agents are created dynamically and delegated to autonomic objects. This solution requires less system resources than static rule agents as the agents are created only when need. It also leads to more efficient rule execution. (3) Rule agent’s behavior is based on script, which allows it to adapt to the execution environment and the rules that it needs to execute. Rule agent scripts can be calibrated at runtime by the rule engine to make rule agents more adaptive.
While typical rule execution is straightforward (actions are
issued when their required conditions are fulfilled), the application dynamics
and user interactions make
things unpredictable. As a result, rule conflicts must be detected at runtime.
In DIOS++, rule conflicts are detected at runtime and are handled by simply
disabling the conflicting rules with lower priorities. This is done by locking
the required sensors/actuators.
Experiment results
DIOS++ has been implemented as a C++ library. This section
summarizes an experimental evaluation of the DIOS library using the IPARS
reservoir simulator framework
on a 32 node beowulf cluster. IPARS is a Fortran-based framework for developing
parallel/distributed reservoir simulators. Using DIOS++/Discover, engineers can
interactively feed in parameters such as water/gas injection rates and well
bottom hole pressure, and observe the water/oil ratio or the oil production
rate. The evaluation consists of 2 experiments:

*For more information, please refer to the paper "DIOS++: A Framework for Rule-Based Autonomic Management of Distributed Scientific Applications"[pdf].
Author: Hua Liu