Vikas Jaiman -Improving Performance Predictability in Cloud Data Stores

14:00

Tuesday

Apr

2019

Thesis defence

Place:

IMAG Building Amphitheatre

Organized by:

Vikas Jaiman

Speaker:

Vikas Jaiman

Teams:

ERODS

The jury members are:

M. Laurent Réveillère - Professor, Université de Bordeaux, reviewer
M. Gaël Thomas - Professor, Telecom SudParis, reviewer
M. Noël De Palma - Professor, Université Grenoble Alpes, examiner
M. Etienne Rivière - Professor, UCLouvain, examiner
M. Vivien Quéma - Professor, Grenoble INP, thesis supervisor
Mme Sonia Ben Mokhtar - Senior Researcher, CNRS LIRIS, thesis co-supervisor

Today, users of interactive services such as e-commerce, web search have increasingly high expectations on the performance and responsiveness of these services. Indeed, studies have shown that a slow service (even for short periods of time) directly impacts the revenue. Enforcing predictable performance has thus been a priority of major service providers in the last decade. But avoiding latency variability in distributed storage systems is challenging since end user requests go through hundreds of servers and performance hiccups at any of these servers may inflate the observed latency. Even in well-provisioned systems, factors such as the contention on shared resources or the unbalanced load between servers affect the latencies of requests and in particular the tail (95th and 99th percentile) of their distribution. The goal of this thesis to develop mechanisms for reducing latencies and achieve performance predictability in cloud data stores. One effective countermeasure for reducing tail latency in cloud data stores is to provide efficient replica selection algorithms. However, under heterogeneous workloads, these algorithms lead to increased latencies for requests with a short execution time that get scheduled behind requests with large execution times. We propose Héron, a replica selection algorithm that supports workloads of heterogeneous request execution times. In the second contribution of the thesis, we focus on multiget workloads to reduce the latency in cloud data stores. The challenge is to estimate the bottleneck operations and schedule them on uncoordinated backend servers with minimal overhead. To reach this objective, we present TailX, a task aware multiget scheduling algorithm that reduces the tail latencies under heterogeneous workloads.