Fernando Machado Mendonca - Multi-purpose Efficient Resource Allocation for Parallel Systems

12:30

Tuesday

May

2017

Thesis defence

Place:

IMAG Building Amphitheatre

Organized by:

Fernando Machado Mendonca

Speaker:

Fernando Machado Mendonca

Teams:

DATAMOVE

Members of the Jury :

M. Denis TRYSTRAM, professeur à l'Université Grenoble Alpes
M. Philippe NAVAUX, professor da Universidade Federal do Rio Grande do Sul
M. Pascal BOUVRY, professeur à l'Université du Luxembourg
M. Frédéric GUINAND, professeur à l'Université Le Havre Normandie
M. Guillaume MERCIER, maître de conférences à l'INP de Bordeaux
M. Frédéric SUTER, chargé de recherche à l'ENS de Lyon
M. Giorgio LUCARELLI, chercheurc ontractuel à INRIA Grenoble

the defense will be streamed on Youtube at the link:

https://www.youtube.com/watch?v=NoAOEVqbz8A

The field of parallel supercomputing has been changing rapidly in recent years. The reduction of costs of the parts necessary to build machines with multicore CPUs and accelerators such as GPUs are of particular interest to us. This scenario allowed for the expansion of large parallel systems, with machines far apart from each other, sometimes even located on different continents. Thus, the crucial problem is how to use these resources efficiently.

In this work, we first consider the efficient allocation of tasks suitable for CPUs and GPUs in heterogeneous platforms. To that end, we implement a tool called SWDUAL, which executes the Smith-Waterman algorithm simultaneously on CPUs and GPUs, choosing which tasks are more suited to one or another. Experiments show that SWDUAL gives better results when compared to similar approaches available in the literature.

Second, we study a new online method for scheduling independent tasks of different sizes on processors. We propose a new technique that optimizes the stretch metric by detecting when a reasonable amount of small jobs is waiting while a big job executes. Then, the big job is redirected to separate set of machines, dedicated to running big jobs that have been redirected. We present experiment results that show that our method outperforms the standard policy and in many cases approaches the performance of the preemptive policy, which can be considered as a lower bound.

Next, we present our study on constraints applied to the Backfilling algorithm in combination with the FCFS policy: Contiguity, which is a constraint that tries to keep jobs close together and reduce fragmentation during the schedule, and Basic Locality, that aims to keep jobs as much as possible inside groups of processors called clusters. Experiment results show that the benefits of using these constrains outweigh the possible decrease in the number of backfilled jobs due to reduced fragmentation.

Finally, we present an additional constraint to the Backfilling algorithm called Full Locality, where the scheduler models the topology of the platform as a fat tree and uses this model to assign jobs to regions of the platform where communication costs between processors is reduced. The experiment campaign is executed and results show that Full Locality is superior to all the previously proposed constraints, and specially Basic Backfilling