Trois candidats CR / Chaire d’excellence : Trois séminaires HADAS

Organisé par : 

Christine Collet (HADAS)

Intervenant : 

Jorge Quiané, Vincent Leroy, Pierre Bourhis

Équipes : 

Trois candidats CR / Chaire d’excellence intéressés par le laboratoire LIG comme laboratoire d’accueil et l’équipe HADAS font un séminaire sur leurs travaux et projets de recherche le jeudi 15 mars.

Salle H105 du bâtiment H de Grenoble INP - Ensimag

14h -15 h 
Title : Managing Very Large Datasets in a Cloudy World 
Speaker : Jorge QuianéResearch Associate, information Systems Group - Saarland University

15h - 16h 
Title : Using distributed architectures to scale Web applications 
Speaker : Vincent LeroyYahoo ! Research, Barcelone, Espagne

16h - 17h 
Title : Declarative languages for evolutive data management in collaborative and distributed systems 
Speaker : Pierre BourhisOxford University.

14h -15 h 
Managing Very Large Datasets in a Cloudy World 
Jorge Quiané, Research Associate, information Systems Group - Saarland University 
Abstract :

Nowadays, many enterprises and organizations are faced with large volumes of data that have to be analyzed in a per-day basis. In particular, scientific datasets are growing at unprecedented rates and are likely to continue growing to the order of Exabytes. These current needs of data management require applications to run over a large number of computing nodes. However, databases management systems (DBMS) have proven inefficient to deal with very large datasets as well as to scale out to a large number of computing nodes. In this context, MapReduce and the Cloud computing are two alternative technologies that respond to this challenge. While MapReduce allows enterprises, organizations, and researchers to easily process very large volumes of data, the Cloud provides the required computing infrastructure to scale applications out to a large number of computing nodes. The beauty of these approaches are their ease-to-use and almost-free-admin cost properties. However, this simplicity comes at a price : the performance of MapReduce applications in the Cloud often do not match the one of a well-configured parallel DBMS. In this talk, we present some of the main features that allow DBMS to achieve orders of magnitude better performance than MapReduce applications. Then, we analyze how our Hadoop++ project allows MapReduce applications to match DBMS performance in the Cloud. We also discussed the design choices we made in the Hadoop++ project in order to preserve the ease-of-use and the almost-free-admin cost of MapReduce applications in the Cloud. Finally, we conclude this talk by discussing some of the challenges imposed by the Cloud to achieve data management efficiently.

15h - 16h 
Using distributed architectures to scale Web applications 
Vincent Leroy, Yahoo ! Research, Barcelona, Spain 
Abstract :

The growth of the Web and the popularity of social networks have led to the creation of innovative applications that provide users with a personalized experience of the Web. However, the increasingly large amounts of data to process and the need to compute personalized content for each user constitute scalability challenges that limit the range of available applications, and increase their operating cost. In this seminar, I will present work from two different projects that rely on distributed architectures to tackle scalability issues. The Gossple project uses P2P algorithms to cluster users into communities and perform personalized search and link prediction. The COAST project aims at creating a distributed Web search engine consisting of a federation of data centers that collaboratively process queries while only indexing a fraction of the Web. Finally, I will propose new directions towards the creation of a platform optimized for Web and social networks data processing.

16h - 17h 
Declarative languages for evolutive data management in collaborative and distributed systems. 
Pierre Bourhis, Oxford University.  
Abstract :

One of the major issues faced by Web applications is the management of evolving of data. The need to support ubiquitous processes centred on databases has increased since 10 years. Prominent examples include e-commerce systems, enterprise business processes, health-care and scientific workflows. To response to this need, it is necessary to develop efficiently applications which core is the management of evolving data. Most of these applications are included in the context of workflows, notably data-centric. In this context, there has recently been a proliferation of workflow specification languages, notably data-centric. The goal of this research topic is to develop practical and theoretical tools to develop and interact with data-centric workflows applications. The following axes will be developed : Efficient implementation of data-centric workflows, automatic distribution of applications, security and collaboration and help to the navigation in applications.