Scientific collaborations, especially in high energy and astroparticles physics, will face a tremendous increase of their computing and storage needs over the next decade. This will imply to revisit both computing models and cyber-infrastructures. However, relying solely on empirical knowledge learned at a much smaller scale is not sufficient. Therefore, I aim at engaging in deep partnerships with these scientific communities, and relying on faithful simulations of distributed systems and applications to provide objective indicators and guide the future evolutions of the computing models and infrastructures. To this end, I am involved in the development of the SimGrid and WRENCH toolkits.
SimGrid is a toolkit that provides core functionalities for the simulation of distributed applications in heterogeneous distributed environments. The simulation engine uses algorithmic and implementation techniques toward the fast simulation of large systems on a single machine. The models are theoretically grounded and experimentally validated. The results are reproducible, enabling better scientific practices. Its models of networks, cpus and disks are adapted to (Data)Grids, P2P, Clouds, Clusters and HPC, allowing multi-domain studies. It can be used either to simulate algorithms and prototypes of applications, or to emulate real MPI applications through the virtualization of their communication, or to formally assess algorithms and applications that can run in the framework. The formal verification module explores all possible message interleavings in the application, searching for states violating the provided properties. We recently added the ability to assess liveness properties over arbitrary and legacy codes, thanks to a system-level introspection tool that provides a finely detailed view of the running application to the model checker. This can for example be leveraged to verify both safety or liveness properties, on arbitrary MPI code written in C/C++/Fortran.
Capitalizing on recent advances in distributed application and platform simulation technology, WRENCH makes it possible to quickly prototype workflow, WMS implementations, and decision-making algorithms and evaluate/compare alternative options scalably and accurately for arbitrary, and often hypothetical, experimental scenarios. This project will define a generic and foundational software architecture, that is informed by current state-of-the-art WMS designs and planned future designs. The implementation of the components in this architecture when taken together form a generic “scientific instrument” that can be used by workflow users, developers, and researchers. This scientific instrument will be instantiated for several real-world WMSs and used for a range of real-world workflow applications.
The goal of the HAC SPECIS Inria Project Lab (IPL) is to answer methodological needs of HPC application and runtime developers and to allow to study real HPC systems both from the correctness and performance point of view. To this end, we gather experts from the HPC, formal verification and performance evaluation community.