Section: New Results
Computational steering environment for distributed numerical simulations
Model for the steering of parallel-distributed simulations
The model that we have proposed in the EPSN framework can only steer efficiently SPMD simulations. A natural evolving is to consider more complex simulations such as coupled SPMD codes called M-SPMD (Multiple SPMD like multiscale simulation for “crack-propagation”) and client/server simulation codes. In order to steer these kinds of simulation, we have designed an extension to the Hierarchical Task Model (HTM), which affords to solve the coherency problem for such complex applications. The EPSN framework has been extended to handle this new kind of simulations. In the context of the ANR MASSIM and ANR NOSSI, we have recently validated our works with a multi-scale simulation for “crack-propagation” (LibMultiScale). In this case-study, EPSN is able to pause/resume the whole coupled simulation, to coherently get and visualize the complex distributed data: a distributed unstructured mesh at the continuum scale, mixed with distributed atoms at the atomic scale. This work is done in the context of the PhD of Nicolas Richart and the defense is planed at the begining of 2010.
Distributed Shared Memory approach for the steering of parallel simulations
As a different approach of the in-situ and steering framework of EPSN, we conceived and developed a light push-driven architecture for in-situ visualization. The architecture, part of ICARUS, is intended to address three principal objectives: Require little or no modification to the simulation code in order to allow a live visualization. Allow the simulation to be run on one parallel machine whilst the visualization is run on a separate (or the same) parallel machine. Provide good performance to ensure that massive simulations may be handled as easily as small test cases. The interface developed is built around the HDF5 file I/O library used commonly in HPC applications. The HDF5 API allows the derivation of custom virtual file drivers (VFDs) which may be instantiated at run-time on a per file basis to control how data is written to the file system. We have made use of this facility to create a specialized MPI based VFD which allows the simulation to write data in parallel to a file, which is actually redirected over the network to a visualization cluster which in turn stores the file in a Distributed Shared Memory (DSM) buffer - or in effect a virtual file system. The ParaView application acts as a server/host for this DSM and can read the file contents directly using the HDF5 API as if reading from disk. The transfer of data between simulation and visualization machines may be done using either an MPI based communicator shared between the applications, or using a socket based communication. The management of both ends of the network transfer is transparently handled by our DSM VFD layer, meaning that an application using HDF5 can make use of in-situ visualization without any code changes. It is only necessary to re-link the application against a modified version of the HDF library which contains our driver. This work has been made and is currently carrying on at CSCS - Swiss National Supercomputing Centre, under the co-supervision of Mr. John Biddiscombe, within the NextMuSE European project 7th FWP/ICT-2007.8.0 FET Open.