Michael Mercier - Contribution à la convergence d'infrastructure entre le calcul haute performance et le traitement de données a large échelle

Organisé par : 
Michael Mercier
Intervenant : 
Michael Mercier


Jury :

- Rapporteurs :
  - Kate Keahey       — Computation Institute University of Chicago and Argonne National Laboratory, U.S.A.
  - Gabriel Antoniu   — Inria, France

- Examinateurs :
  - Frédéric Deprez   — LIG, Inria, France
  - Chrisitian Perez  — LIP, Inria, France
  - Frédéric Suter    — IN2P3, France

- Directeurs de thèse :
  - Bruno Raffin      — LIG, Inria, France
  - Olivier Richard   — LIG, Univ. Grenoble Alpes, France
  - Benoit Pelletier  — Atos, France

The amount of produced data, either in the scientific community or the commercial world, is constantly growing. The field of Big Data has emerged to handle large amounts of data on distributed computing infrastructures. High-Performance Com- puting (HPC) infrastructures are traditionally used for the execution of compute intensive workloads. However, the HPC community is also facing an increasing need to process large amounts of data derived from high definition sensors and large physics apparati. The convergence of the two fields -HPC and Big Data- is currently taking place. In fact, the HPC community already uses Big Data tools, which are not always integrated correctly, especially at the level of the file system and the Resources and Job Management System (RJMS).

In order to understand how we can leverage HPC clusters for Big Data usage, and what are the challenges for the HPC infrastructures, we have studied multiple aspects of the convergence: We initially provide a survey on the software provi- sioning methods, with a focus on data-intensive applications. We contribute a new RJMS collaboration technique called BeBiDa which is based on 50 lines of code whereas similar solutions use at least 1000 times more. We evaluate this mecha- nism on real conditions and in simulated environment with our simulator Batsim. Furthermore, we provide extensions to Batsim to support I/O, and showcase the developments of a generic file system model along with a Big Data application model. This allows us to complement BeBiDa real conditions experiments with simulations while enabling us to study file system dimensioning and trade-offs.

All the experiments and analysis of this work have been done with reproducibility in mind. Based on this experience, we propose to integrate the development workflow and data analysis in the reproducibility mindset, and give feedback on our experiences with a list of best practices.