Section: Scientific Foundations
Efficient algorithmics for code coupling in complex simulations
Participants : Olivier Coulaud, Aurélien Esnard, Damien Genet, Nicolas Richart, Jean Roman, Jérôme Soumagne.
Many important physical phenomena in material physics and climate modelling are inherently complex applications. They often use multiphysics or multiscale approaches, that couple different models and codes. There is typically one model per different scale or physics; and each model is implemented by a parallel code. For instance, to model crack propagation, one uses two scales: an atomistic model and a continuum model discretized by a finite element method. These phenomena are simulated by coupling different parallel codes such as molecular dynamic code and elasticity code.
The experience that we have acquired in the ScAlApplix project through the activities in crack propagation simulations with LibMultiScale and in M-by-N computational steering (coupling simulation with parallel visualization tools) with EPSN shows us that if the model aspect was well studied, several problems in parallel or distributed algorithms are still open and not well studied. In the context of code coupling in HiePACS , we want to contribute more precisely to the following points.
Efficient schemes for multiscale simulations
As mentioned previously, many important physical phenomena, such as material deformation and failure (see Section 4.2 ), are inherently multiscale processes that cannot always be modeled via continuum model. Fully microspcopic simulations of most domains of interest are not computationally feasible. Therefore, researchers must look at multiscale methods that couple micro models and macro models. Combining different scales such as quantum-atomistic or atomistic, mesoscale and continuum, are still a challenge to obtain efficient and accurate schemes that efficiently and effectively exchange information between the different scales. We are currently involved in two national research projects (ANR), that focus on multiscale schemes. More precisely, the models that we start to study are the quantum to atomic coupling (QM/MM coupling) in the NOSSI ANR and the atomic to dislocation coupling in the OPTIDIS ANR (proposal for the 2010 COSINUS call of the French ANR).
Coupling of complex simulations based on the hypergraph model
The performance of the coupled codes depends on how well the data are distributed among the processors. Generally, the data distributions of each code are built independently from each other to obtain the best load-balancing. But once the codes are coupled, the naive use of these decompositions can lead to important imbalance in particular, when we have an overlap zone beetwen the different models. Therefore, the modelling of the coupling itself is crucial to improve the performance and to ensure a good scalability of the coupled codes. The goal here is to find the best data distribution for the whole coupled code and not only for each standalone code. The main idea is to use an hypergraph model as the one provided by the ZOLTAN toolkit, and to take into account more information in the coupling than the classical one used by graph partitionner. Indeed, in the hypergraph model, the hyperedge cuts accurately measure communication volume, while, in the graph model, the edge cuts only approximate the communication volume. Moreover, recent works on hypergraph partitioning with fixed vertices have demonstrated their effectiveness for dynamic load balancing of adaptative simulations. As the load balancing problem is quite close to the redistribution one, we expect to provide new redistribution algorithm using similar strategies. For example, we should add in the communication cost the redistribution cost between codes (that depends on the volume of data exchanged); and we should add in the computation cost the interpolation cost, and so on. In addition, we expect the greater expressiveness of the hypergraph model help us to model each individual simulation code more accurately and thus enables us to improve their scalability thanks to a better partition quality. Another connected problem is the problem of resource allocation. This is particularly important for the global coupling efficiency and scalabilty, because each code involved in the coupling can be more or less computationally intensive, and there is a good trade-off to find between resources assigned to codes to avoid idle time. Typically, if we have a given number of processors and two coupled codes, how to split the processors among each code?
Steering and Interacting with complex coupled simulations
The computational steering is an effort to make the typical simulation work-flow (modelling, computing, analyzing) more efficient, by providing online visualization and interactive steering over the on-going computational processes. The online visualization appears very useful to monitor and to detect possible errors in long-running applications, and the interactive steering allows the researcher to alter simulation parameters on-the-fly and to immediately receive feedback on their effects. Thus, the scientist gains an additional insight in the simulation regarding to the cause-and-effect relationship.
In the ScAlApplix project, we have studied this problem in the case where both the simulation and the visualization can be parallel, what we call M-by-N computational steering, and we have developed a software environment called EPSN (see Section 5.3 ). More recently, we have proposed a model for the steering of complex coupled simulations and one important conclusion we have from these previous works is that the steering problem can be conveniently modeled as a coupling problem between one or more parallel simulation codes and one visualization code, that can be parallel as well. We propose in HiePACS to revisit the steering problem as a coupling problem and we expect to reuse the new redistribution algorithms developped in the context of code coupling for the purpose of M-by-N steering.
In several applications, it is often very useful either to visualize the results of the ongoing simulation before writing it to disk, or to steer the simulation by modifying some parameters and visualize the impact of these modifications interactively. Nowadays, high performance computing simulations use many computing nodes, that perform I/O using the widely used HDF5 file format. One of the problems is now to use real-time visualization using high performance computing. In that respect we need to efficiently combine very large parallel simulation systems with parallel visualization systems. The originality of this approach is the use of the HDF5 file format to write in a distributed shared memory (DSM); so that the data can be read from the upper part of the visualization pipeline. This leads to define a relevant steering model based on a DSM. It implies finding a way to write/read data efficiently in this DSM, and steer the simulation. This work is developed in collaboration with the Swiss National Supercomputing Centre (CSCS). As concerns the interaction aspect, we are interested in providing new mechanisms to interact with the simulation directly through the visualization. For instance in the ANR NOSSI, in order to speed up the computation we are interested in rotating a molecule in a cavity or in moving it from one cavity to another within the crystal latice. To perform safely such interactions a model of the interaction in our steering framework is necessary to keep the data coherency in the simulation. Another point we plan to study is the monitoring and interaction with ressources, in order to perform user-directed checkpoint/restart or user-directed load balancing at runtime.