Millian Poquet - Simulation approach for resource management

Organized by: 
Millian Poquet
Millian Poquet


- Reviewers:
    - Henri Casanova — University of Hawai‘i at Mānoa, États-Unis
    - Georges Da Costa — IRIT (Toulouse), France
- Examiners:
    - Yves Denneulin — Grenoble INP, France
    - Frédéric Desprez — Inria (Grenoble), France
    - Sascha Hunold — Technische Universität Wien, Autriche
    - Anne-Cécile Orgerie — IRISA (Rennes), France
- Guest:
    - Olivier Richard — Université Grenoble Alpes, France
- Supervisors:
    - Pierre-François Dutot — Université Grenoble Alpes, France
    - Denis Trystram — Grenoble INP, France


Computing platforms increasingly grow in power and complexity. Numerous challenges remain to build next generations of platforms, but exploiting the platforms is a challenge per se. Constraints such as energy consumption, data movements and resilience risk to initiate breaking points in the way that the platforms are managed — especially with the convergence of the different types of distributed platforms.

Resource and Jobs Management Systems (RJMSs) are critical middlewares that allow users to exploit the resources of such platforms. They must evolve to make the best use of the computing platforms while complying with these new constraints. Each evolution ideally require many iterations, but conducting them in vivo is not reasonable due to huge overhead. Simulation is an efficient way to tackle the subsequent problems, but particular caution must be taken when drawing results from simulation as using ill-suited models may lead to invalid results.

The first contribution of this dissertation is the proposition of a modular simulation methodology to study RJMSs and their evolution realistically — and the related simulator Batsim. The main idea is to strongly separate the simulation from the decision-making algorithms. This allows separation of concerns as any algorithm can benefit from a validated simulation with multiple levels of realism (features, accuracy of the models). This methodology improves the production launch of new policies since both academic prototypes and production RJMSs can be studied in the same context.

Batsim is used in the second part of this dissertation, which focuses on online and non-clairvoyant resource management policies to save energy. Several algorithms are first proposed and analyzed to maximize performances under an energy budget for a given time period. This dissertation then explores more generally possible energy and performances trade-offs that can be obtained with node shutdown techniques.