Julien Pilourdault - Scalable Algorithms For Monitoring Activity Traces

Organized by: 
Julien Pilourdault
Julien Pilourdault



  • Nadia Brauner, Professor at Université Grenoble Alpes, Examiner
  • Patrick Valduriez, Senior Researcher at INRIA, Reviewer
  • Donald Kossmann, Professor at ETH Zurich, Reviewer
  • Yanlei Diao, Professor at Ecole Polytechnique, Examiner
  • Sihem Amer-Yahia, Senior Researcher at CNRS, Director
  • Vincent Leroy, Professor at Université Grenoble Alpes, Co-Advisor


In this thesis, we study scalable algorithms for monitoring activity traces. In several domains, monitoring is a key ability to extract value from data and improve a system. This thesis aims to design algorithms for monitoring two kinds of activity traces. First, we investigate temporal data monitoring. We introduce a new kind of interval join, that features scoring functions reflecting the degree of satisfaction of temporal predicates. We study these joins in the context of batch processing: we formalize Ranked Temporal Join (RTJ), that combine collections of intervals and return the k best results. We show how to exploit the nature of temporal predicates and the properties of their associated scored semantics to design TKIJ , an efficient query evaluation approach on a distributed Map-Reduce architecture. Our extensive experiments on synthetic and real datasets show that TKIJ outperforms state-of-the-art competitors and provides very good performance for n-ary RTJ queries on temporal data. We also propose a preliminary study to extend our work on TKIJ to stream processing. Second, we investigate monitoring in crowdsourcing. We advocate the need to incorporate motivation in task assignment. We propose to study an adaptive approach, that captures workers’ motivation during task completion and use it to revise task assignment accordingly across iterations. We study two variants of motivation-aware task assignment: Individual Task Assignment (Ita) and Holistic Task Assignment (Hta). First, we investigate Ita, where we assign tasks to workers individually, one worker at a time. We model Ita and show it is NP-Hard. We design three task assignment strategies that exploit various objectives. Our live experiments study the impact of each strategy on overall performance. We find that different strategies prevail for different performance dimensions. In particular, the strategy that assigns random and relevant tasks offers the best task throughput and the strategy that assigns tasks that best match a worker’s compromise between task diversity and task payment has the best outcome quality. Our experiments confirm the need for adaptive motivation-aware task assignment. Then, we study Hta, where we assign tasks to all available workers, holistically. We model Hta and show it is both NP-Hard and MaxSNP-Hard. We develop efficient approximation algorithms with provable guarantees. We conduct offline experiments to verify the efficiency of our algorithms. We also conduct online experiments with real workers and compare our approach with various non-adaptive assignment strategies. We find that our approach offers the best compromise between performance dimensions thereby assessing the need for adaptability.