Vinicius Garcia Pinto - Performance Analysis Strategies for Task-based Applications on Hybrid Platforms

Organized by: 
Vinicius Garcia Pinto
Vinicius Garcia Pinto

Adresse de la soutenance :

Prog. de Pós-Graduação em Computação, Instituto de Informática, Universidade Federal do Rio Grande do Sul, Caixa postal 15064, Av. Bento Gonçalves, 9500, 91501-970, Porto Alegre, RS BRESIL - salle AUD 0


Jury :

  • Arnaud  Legrand, chargé de recherche, CNRS délégation Alpes, directeur de thèse
  • Alfredo Goldman Vel Lejbman, professeur associe, universite de são paulo - Brésil, rapporteur
  • Gaël  Thomas, professeur, Telecom SudParis, rapporteur
  • Mathieu  Faverge, maitre de conferences, Institut Polytechnique De Bordeaux, examinateur
  • Nicolas Maillard, professeur associe, ufrgs - Brésil, directeur de thèse
  • Gerson Geraldo Homrich Cavalheiro, professeur associe, Univ. Fédérale de Pelotas - Brésil, examinateur
  • Philippe Olivier A.  Navaux, professeur associe, UFRGS - Brésil, examinateur
  • Bernd Mohr, professeur associe, centre recherche Jülich - Allemagne, examinateur

Programming paradigms in High-Performance Computing have been shifting toward task-based models that are capable of adapting readily to heterogeneous and scalable supercomputers. The performance of task-based applications heavily depends on the runtime scheduling heuristics and on its ability to exploit computing and communication resources. 
Unfortunately, the traditional performance analysis strategies are unfit to fully understand task-based runtime systems and applications: they expect a regular behavior with communication and computation phases, while task-based applications demonstrate no clear phases. Moreover, the finer granularity of task-based applications typically induces a stochastic behavior that leads to irregular structures that are difficult to analyze. 
In this thesis, we propose performance analysis strategies that exploit the combination of application structure, scheduler, and hardware information. We show how our strategies can help to understand performance issues of task-based applications running on hybrid platforms. Our performance analysis strategies are built on top of modern data analysis tools, enabling the creation of custom visualization panels that allow understanding and pinpointing performance problems incurred by bad scheduling decisions and incorrect runtime system and platform configuration. By combining simulation and debugging we are also able to build a visual representation of the internal state and the estimations computed by the scheduler when scheduling a new task. 
We validate our proposal by analyzing traces from a Cholesky decomposition implemented with the StarPU task-based runtime system and running on hybrid (CPU/GPU) platforms. Our case studies show how to enhance the task partitioning among the multi-(GPU, core) to get closer to theoretical lower bounds, how to improve MPI pipelining in multi-(node, core, GPU) to reduce the slow start in distributed nodes and how to upgrade the runtime system to increase MPI bandwidth. By employing simulation and debugging strategies, we also provide a workflow to investigate, in depth, assumptions concerning the scheduler decisions. This allows us to suggest changes to improve the runtime system scheduling and prefetch mechanisms.