Publications

Highlights

(For a full list see below)

Developing Accurate and Scalable Simulators of Production Workflow Management Systems with WRENCH

In this work we present WRENCH, a WMS simulation framework, whose objectives are (i) accurate and scalable simulations; and (ii) easy simulation software development. WRENCH achieves its first objective by building on the SimGrid framework. While SimGrid is recognized for the accuracy and scalability of its simulation models, it only provides low-level simulation abstractions and thus large software development efforts are required when implementing simulators of complex systems. WRENCH thus achieves its second objective by providing high-level and directly re-usable simulation abstractions on top of SimGrid. After describing and giving rationales for WRENCH’s software architecture and APIs, we present two case studies in which we apply WRENCH to simulate the Pegasus production WMS and the WorkQueue application execution framework. We report on ease of implementation, simulation accuracy, and simulation scalability so as to determine to which extent WRENCH achieves its objectives. We also draw both qualitative and quantitative comparisons with a previously proposed workflow simulator.

Rafael Ferreira da Silva, Henri Casanova, Ryan Tanaka, Suraj Pandey, Gautam Jethwani, Spencer Albrecht, James Oeth, and Frédéric Suter

Future Generation Computer Systems, 112:162-175, November 2020.

Improving Fairness in a Large Scale HTC System Through Workload Analysis and Simulation

We analyzed the High Throughput Computing (HTC) workload trace executed at the CC-IN2P3. The Fair-Share algorithm at the core of the batch scheduler ensures that all user groups are fairly provided with an amount of computing resources commensurate to their expressed needs. However, a deeper analysis of the produced schedule, especially of the job waiting times, shows a certain degree of unfairness between user groups. We identified the configuration of the quotas and scheduling queues as the main root causes of this unfairness. We proposed a drastic reconfiguration of the system more suited to the characteristics of the workload and better balancing the waiting time among user groups. We evaluated the impact of this reconfiguration through detailed simulations.

Frédéric Azevedo, Dalibor Klusacek, and Frédéric Suter

25th International Euro-Par Conference (Euro-Par’19), Göttigen, Germany, August 2019.

SMPI Courseware: Teaching Distributed-Memory Computing with MPI in Simulation

Teaching High Performance Computing (HPC) by giving students access to HPC platforms comes with several logistical and pedagogical challenges. A way to address these challenges is to instead simulate program executions on arbitrary HPC platform configurations. Thanks to SMPI, an MPI simulator provided as part of SimGrid, Students write standard MPI programs and can both debug and analyze the performance of their programs in simulation mode and large-scale executions can be simulated in short amounts of time on a single standard laptop computer. SMPI Courseware is a set of in-simulation assignments that can beincorporated into HPC courses to provide students with hands-onexperience for distributed-memory computing and MPI program-ming learning objectives. This paper obtained a Best Paper Award at EduHPC’18.

Henri Casanova, Arnaud Legrand, Martin Quinson and Frédéric Suter

Workshop on Education for High-Performance Computing (EduHPC-18), Dallas, Texas, November 2018.

Evaluation through Realistic Simulations of File Replication Strategies for Large Heterogeneous Distributed Systems

We investigated how platform models influence the performance of file replication strategies on large heterogeneous distributed systems, such as prestaging and dynamic replication. The novelty of this study resides in our evaluation using a realistic simulator. We also derived recommendations for the implementation of an optimized data management strategy in a scientific gateway for medical image analysis. This paper obtained the Best Workshop Paper Award of the EuroPar’18 conference.

Anchen Chai, Sorina Camarasu-Pop, Tristan Glatard, Hugues Benoit-Cattin and Frédéric Suter

16th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar’2018), Turin, Italy, August 2018

Simulating MPI applications: the SMPI approach

This article summarizes our recent work and developments on SMPI, a flexible simulator of MPI applications. In this tool, we took a particular care to ensure our simulator could be used to produce fast and accurate predictions in a wide variety of situations. Although we did build SMPI on SimGrid whose speed and accuracy had already been assessed in other contexts, moving such techniques to a HPC workload required significant additional effort. Obviously, an accurate modeling of communications and network topology was one of the key to such achievements. Another less obvious key was the choice to combine in a single tool the possibility to do both offline and online simulation.

Augustin Degomme, Arnaud Legrand, George Markomanolis, Martin Quinson, Mark Stillwell, and Frédéric Suter

IEEE Transactions on Parallel and Distributed Systems , 28(8):2387-2400, August 2017.

Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms

We describe recent accuracy and scalability advances made in the context of the SimGrid simulation framework. A design goal of SimGrid is that it should be versatile, i.e., applicable across multiple domains. We present quantitative results that show that SimGrid compares favorably to state-of-the-art domain-specific simulators in terms of scalability, accuracy, or the trade-off between the two. An important implication is that, contrary to popular wisdom, striving for versatility in a simulator is not an impediment but instead is conducive to improving both accuracy and scalability.

Henri Casanova, Arnaud Giersch, Arnaud Legrand, Martin Quinson, and Frédéric Suter

Journal of Parallel and Distributed Computing, 74(10):2899-2917, October 2014.

 

Full List

Speed-Robust Scheduling
Franziska Eberle, Ruben Hoeksma, Nicole Megow, Lukas Nölke, Kevin Schewior and Bertrand Simon
Proceedings of the 22nd Conference onInteger Programming and Combinatorial Optimization, Atlanta, Georgia, May 2021.

Learning-based Approaches to Estimate Job Wait Time in HTC Datacenters
Luc Gombert and Frederic Suter
Proceedings of the 24th Workshop on Job Scheduling Strategies for Parallel Processing, Portland, Oregon, May 2021.

Developing Accurate and Scalable Simulators of Production Workflow Management Systems with WRENCH
Rafael Ferreira da Silva, Henri Casanova, Ryan Tanaka, Suraj Pandey, Gautam Jethwani, Spencer Albrecht, James Oeth, and Frédéric Suter
Future Generation Computer Systems, 112:162-175, November 2020.

Characterizing, Modeling, and Accurately Simulating Power and Energy Consumption of I/O-intensive Scientific Workflows
Rafael Ferreira da Silva, Henri Casanova, Anne-Cécile Orgerie, Ryan Tanaka, Ewa Deelman, and Frédéric Suter
Journal of Computational Science, 44:101157, July 2020.

Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Rafael Ferreira da Silva, Henri Casanova, Ryan Tanaka, and Frédéric Suter
Bridging from Concepts to Data and Computation for eScience (BC2DC’19), in conjunction with the 15th eScience International Conference, San Diego, California, September 2019.

Alea - Complex Job Scheduling Simulator
Dalibor Klusacek, Mehmet Soysal, and Frédéric Suter
13th International Conference on Parallel Processing and Applied Mathematics (PPAM) Bialystok, Poland, September 2019.

Improving Fairness in a Large Scale HTC System Through Workload Analysis and Simulation
Frédéric Azevedo, Dalibor Klusacek, and Frédéric Suter
25th International Euro-Par Conference (Euro-Par’19), Göttigen, Germany, August 2019.

Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Rafael Ferreira da Silva, Anne-Cécile Orgerie, Henri Casanova, Ryan Tanaka, Ewa Deelman, and Frédéric Suter
International Conference on Computational Science (ICCS’19), Faro, Portugal, June 2019.

SMPI Courseware: Teaching Distributed-Memory Computing with MPI in Simulation
Henri Casanova, Arnaud Legrand, Martin Quinson and Frédéric Suter
Workshop on Education for High-Performance Computing (EduHPC-18), Dallas, Texas, November 2018.

WRENCH: Workflow Management System Simulation Workbench
Henri Casanova, Suraj Pandey, James Oeth, Ryan Tanaka, Frédéric Suter and Rafael Ferreira da Silva
13th Workshop on Workflows in Support of Large-Scale Science (WORKS’18), Dallas, Texas, November 2018.

Evaluation through Realistic Simulations of File Replication Strategies for Large Heterogeneous Distributed Systems
Anchen Chai, Sorina Camarasu-Pop, Tristan Glatard, Hugues Benoit-Cattin and Frédéric Suter
16th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar’2018), Turin, Italy, August 2018

Reducing the Human-in-the-Loop Component of the Scheduling of Large HTC Workloads
Frédéric Azevedo, Luc Gombert, and Frédéric Suter
22nd Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2018), Vancouver, Canada, May 2018.

Simulating MPI applications: the SMPI approach
Augustin Degomme, Arnaud Legrand, George Markomanolis, Martin Quinson, Mark Stillwell, and Frédéric Suter
IEEE Transactions on Parallel and Distributed Systems , 28(8):2387-2400, August 2017.

Don’t Hurry be Happy: a Deadline-based Backfilling Approach
Tchimou N’takpé and Frédéric Suter
21st Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2017), Orlando, FL, June 2017.

Modeling Distributed Platforms from Application Traces for Realistic File Transfer Simulation
Anchen Chai, Mohammad-Mahdi Bazm, Sorina Camarasu-Pop, Tristan Glatard, Hugues Benoit-Cattin and Frédéric Suter
17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Madrid, Span May 2017.

Toward More Scalable Off-Line Simulations of MPI Applications
Henri Casanova, Anshul Gupta, and Frédéric Suter
Parallel Processing Letters , 25(3):1541002, September 2015.

Adding Storage Simulation Capacities to the SimGrid Toolkit: Concepts, Models, and API
Adrien Lèbre, Arnaud Legrand, Frédéric Suter and Pierre Veyre
15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzen, China, May 2015.

Simulation of MPI Applications with Time-Independent Traces
Henri Casanova, Frédéric Desprez, George Markomanolis, and Frédéric Suter
Concurrency and Computation: Practice and Experience, 27(5):1145-1168, April 2015.

Versatile, Scalable, and Accurate Simulation of Distributed Applications and Platforms
Henri Casanova, Arnaud Giersch, Arnaud Legrand, Martin Quinson, and Frédéric Suter
Journal of Parallel and Distributed Computing, 74(10):2899-2917, October 2014.