
Salsa is a novel, decentralized and asynchronous realization of the "replica exchange" algorithm for simulating the structure, function, folding, and dynamics of proteins. Salsa provides a scalable communication and interaction substrate that presents a virtual shared space abstraction and enables the dynamic and asynchronous interactions required by the simulations to be simply and efficiently implemented.
Replica exchange is a powerful sampling algorithm that preserves canonical distributions and allows for efficient crossing of high energy barriers that separate thermodynamically stable states. In this formulation, several copies, or replicas, of the system of interest are simulated in parallel at different temperatures using "walkers''. These walkers occasionally swap temperatures to allow them to bypass enthalpic barriers by moving to a higher temperature. The replica exchange algorithm has several advantages over formulations based on constant temperature, and has the potential for significantly impacting the fields of structural biology and drug design. Specifically, the problems of structure based drug design and the study of the molecular basis of human diseases associated with protein misfolding, which are the applications currently targeted by this research. However, efficient and scalable parallel implementations of general formulations of the replica exchange algorithm present significant challenges. These challenges are primarily due to the dynamic and complex coordination and communication patterns between the walkers. These communication/coordination patterns depend on state of the simulation at the replicas and are known only at runtime, and consequently, implementing these simulations using commonly used parallel programming frameworks is non-trivial. As a result, to the best of our knowledge, all the current parallel/distributed implementations of replica exchange simulations in use by the structural biology community are based on a simplified formulation of the algorithm that limits the potential power of the technique in two important ways: (1) the only parameter exchanged between the replicas is the temperature of each replica, and (2) the exchanges occur in a centralized and totally synchronous manner, and only between replicas with adjacent temperatures. The former limits the effectiveness of the method, while the latter limits its scalability to at most tens of homogeneous and relatively tightly coupled processors.
Replica exchange is an advanced canonical conformational sampling algorithm designed to help overcome the sampling problem encountered in biomolecular simulations. The method had been proposed independently on several occasions in various disciplines. In this method, several copies, or replicas, of the system of interest are simulated in parallel at different temperatures using walkers. These walkers occasionally swap temperatures based on a proposal probability that maintains detailed balance. These exchanges allow individual replicas to bypass enthalpic barriers by moving to high temperatures. A parallel version of this algorithm was proposed by Hukushima & Nemoto. The MD replica exchange canonical sampling method has been implemented in IMPACT, the molecular simulation package used in this work, following the approach proposed by Sugita & Okamoto. The method consists of running a series of simulations at fixed specified temperatures. Each replica corresponds to a temperature. An exchange of temperatures between replicas i and j at temperatures T_m and T_n is attempted periodically and is accepted according to the following Metropolis transition probability:
(1)
where
and E_i and E_j are the potential energies
of replicas i and j, respectively. After a successful exchange, the velocities of replicas
i and j are rescaled at the new temperature. Molecular dynamics programs are essentially loops over a large
number of integration steps, each of which advances the time forward for one step. Replica exchange is attempted periodically at a chosen
interval of steps. Existing MPI-based parallel implementations of replica exchange are
centralized and synchronous. Existing replica exchange implementations have several limitations. First, the scheme limits the exchange to only
neighboring temperatures. This limitation is not a concern when the number of replicas is small and there is a small chance of exchange
between non-nearest temperatures. However, as the number of processors (and correspondingly walkers) increases, the difference
between target temperatures becomes small enough to allow exchanges between non-nearest neighbor replicas. In such cases, more flexible
schemes which allows non-nearest neighbor temperature exchange are desirable. Second, the implementation is based on a centralized
master that gathers and scatters data system wide. Gathering data from all the nodes on a single node may be infeasible in large
systems, and a centralized master can quickly become a bottleneck. Further, gather and scatter operations are synchronous and
expensive. Also, since the master node also participates in the simulation as a walker, there is a load imbalance which can lead to
additional synchronization overheads.
Enabling larger scale implementations of the general replica exchange formulation thus requires a communication substrate that can support decentralized management of exchange schedules, and asynchronous and decoupled communications, eliminating synchronization overheads and allowing implementations to be scalable. Salsa is a communication and interaction substrate that meets these requirements, and enables a novel, decentralized and asynchronous realization of the replica exchange algorithm for simulating the structure, function, folding, and dynamics of proteins. Salsa enables arbitrary walkers to dynamically exchange target temperatures and other information in a pair-wise manner based on an exchange probability condition that ensures detailed balance. The Salsa-based replica exchange realization distinguishes itself from existing implementations in multiple aspects. As mentioned above, existing replica exchange implementations use a simplification of the replica exchange algorithm where the walkers that can exchange temperatures is limited to those with neighboring target temperatures. In these implementations, the pairs of walkers that exchange temperatures is centrally determined at a single node by periodically gathering temperatures from all the walkers at this node. While such a scheme is acceptable for simulations with a small number of walkers, as the number of walkers increases, the possibility of successful non-nearest neighbor temperature exchange also increases significantly, and the temperature mixing is severely impeded when exchanges can only occur between neighboring temperatures.
Salsa essentially provides a virtual shared space abstraction that is specifically customized for replica exchange. The pairs of
walkers that exchange information are dynamically and asynchronously determined using this virtual shared space abstraction. Walkers
periodically post temperature ranges that are of current interest for exchange to the shared space. If this range overlaps with the
range of interest posted by another walker, an exchange can occur. The actual exchange is then negotiated and completed by the
individual walkers in a peer-to-peer manner. As a result, the exchanges are decoupled, dynamically and asynchronously determined,
and not limited to neighboring temperatures. Salsa provides simple tuple-space-like abstractions for accessing the
virtual space. Walkers use the operator to post temperature ranges of interest, and use either the blocking
get or the non-blocking getp operator to retrieve a new temperature if the attempted exchange is successful or the old
one if the attempted exchange is unsuccessful. Further, since exchanges are decoupled and asynchronous, communications and
computation at the walkers can be overlapped to improve overall simulation performance.
Salsa has been implemented within the IMPACT (Integrated Modeling Program, Applied Chemical Theory) molecular mechanics
program to enable two specific applications, which require large scale distributed replica exchange implementations:
(1) simulations of the binding of ligands to the cytochrome P450 class of enzymes responsible for cellular detoxification and drug
metabolism, and (2) simulations of the misfolding of naturally occurring human and mutated forms of the protein synuclein
associated with Parkinson's disease. It is not possible to carry out these studies using standard molecular dynamics simulation
techniques.
A. Architecture
Salsa provides abstractions and underlying mechanisms to support efficient and scalable parallel implementations of the general replica exchange formulation, where walkers can exchange non-nearest neighbor temperatures in a decoupled, decentralized, and asynchronous manner. It essentially provides the abstraction of a virtual shared space that is specifically customized for replica exchange. The shared space supports tuple-space-like abstractions and is used by walkers to post temperature ranges of exchange interest and discover candidate walkers that it can potentially exchange temperature with. The underlying mechanisms support negotiation and efficient peer-to-peer data exchanges between appropriate walkers. The framework consists of two main components: (1) a distributed directory layer; and (2) a communication layer. These components are described below.
B. Interface
The operators provided by the Salsa application programming interface (API) are listed in Table I. These operations provide associative access to the virtual shared space and are similar to the operators provided by the tuple space model. The interface includes operators to initialize the Salsa runtime init, to allow processes to post (using post) temperature range of exchange interest and retrieve available exchanged temperatures (using get/getp). Note that the retrieved temperatures are removed from the space.

C. Parallel Asynchronous Replica Exchange using Salsa
An implementation of a parallel asynchronous replica exchange using Salsa is illustrated in
Fig. 1. The scheme consists of two phases. In the post phase, candidate exchange
partners are identified and notified. These walkers negotiate with each other to create
potential exchange partnerships. In the get or getp phase, these potential partners then attempt to
exchange data.

Fig. 1 An example parallel asynchronous replica exchange implementation using Salsa
The replica exchange algorithm implemented using the Salsa API is illustrated by the psuedo-code presented in Fig. 2.

Fig. 2 Pseudo-code illustrating the implementation of replica exchange using Salsa.
A prototype of Salsa has been implemented and tested. The effectiveness and scalability of the Salsa-based replica exchange algorithm is evaluated using the alanine tripeptide molecule. All the tests are configured to run for 10 ns total simulation time using the Hybrid Monte Carlo (HMC) molecular dynamics sampling algorithm. Each run is composed of 250,000 HMC cycles, each including 10 molecular dynamics integration steps with a 4 fs time-step. Replica exchange is attempted every 25 HMC cycles. Replica exchange temperatures are distributed exponentially within the 200-700 K range. The experiments compare the efficiency and performance of the Salsa-based implementation with the original MPI-based implementation in Impact.

Table II shows the number of temperature cross-walks measured for Salsa-based replica exchange simulations with 8, 16, 32, 60, and 120 walkers, compared with the corresponding number of cross-walks obtained using the MPI-based synchronous implementation. In these experiments, the temperature range posted by walkers in the Salsa-base replica exchange implementation was set to a window of 300 K around its target temperature, i.e., [temp - 80 K, temp + 80 K] for table entry [-80 K, 80 K] or as [temp - 150 K, temp + 150 K] for table entry [-150 K, 150 K]. In the configuration of [temp - 80 K, temp + 80 K], a walker can exchange temperature with any other walker whose target temperature is less than 160K away, i.e., the ranges that the two walkers post intersect. Similarly, in the configuration of [temp - 150 K, temp + 150 K], a walker can exchange temperature with any other walker whose target temperature is less than 300K away. The temperature ranges that defined a cross-walk in the experiments are listed in the Table II. As seen from the results listed in the table, the number of observed temperature cross-walks is larger for the Salsa-based simulation, and it increases as the number of walkers increases. In contrast, the number of cross-walks observed for the MPI-based implementation decreases as the number of walkers increases. With 120 walkers there are only 6 temperature cross-walks observed in the case of the MPI-based implementation. In comparison 340 cross-walks are observed for the Salsa-based implementation. These results demonstrate that using non-nearest neighbors replica exchange algorithm enabled by Salsa provides significant benefits, especially when the density of the temperature distribution is large.
Conclusion and Future Work
The paper presents a novel implementation of replica exchange algorithm using Salsa, a framework that presents a shared space abstraction at runtime to user applications. The Salsa-based replica exchange supports exchange between non-nearest neighboring temperatures, asynchronous communication to enable overlapping communication with computation, decentralized communication paradigm to avoid the central bottleneck in a client/server version of the implementation. The approach helps improving the overall efficiency and scalability of the simulation.
The overall goal of the project is to enable large-scale Grid-based parallel and distributed molecular simulations of protein structural changes and drug binding to proteins. Specific tasks include (1) implementing a prototype interaction and coordination framework, based on Salsa, for wide-area distributed replica exchange simulations, (2) developing, deploying and evaluating the Grid-based Impact implementation, and (3) using the grid-based Impact implementation to provide scientific insights.
TASSL, ECE Rutgers University BIOMAP Institute, Rutgers University
Dr.Manish Parashar Associate Professor Dr. Emilio Gallicchio Vice Director
Li Zhang Graduate Student Dr. Ronald M. Levy Director