Pierre Genevès - Static Analysis for Data-Centric Web Programming

Organisé par : 
Pierre Genevès
Intervenant : 
Pierre Genevès
Équipes : 

Lieu : Inria Grenoble Rhône-Alpes, grand amphi. 

- Denis Trystram, Professeur - Grenoble INP (président) 
- Boualem Benatallah, Professeur - University of New South Wales (rapporteur) 
- Giuseppe Castagna, Directeur de recherche CNRS - Laboratoire PPS (rapporteur) 
- Amedeo Napoli, Directeur de recherche CNRS - Loria (rapporteur) 
- Nabil Layaïda, Directeur de recherche Inria - Inria Grenoble Rhône-Alpes (examinateur) 
- Denis Lugiez, Professeur - Université de Provence (examinateur) 

Today, our private life, our work, and more generally data correctness are put at risk by web applications. This is already witnessed by a number of failures with dramatic consequences. For instance, in June 2011, a breach in Citibank's web portal exposed confidential data concerning 200 000 customer accounts. In March 2011, a failure in the GlobalPayment system exposed 1.5 million of credit card numbers. The frequency of such failures is increasing rapidly. In February 2013, 250 000 Tweeter accounts were compromised. In August 2013, global web traffic brutally fell 40% when Google was down for 11 minutes. In January 2014, the credit card details of 20 million South Koreans were stolen. 

By manipulating unprecedented volumes of data from unprecedentedly large numbers of users, web applications are reaching a criticality level that was never attained before. They are on the verge of becoming the critical applications of the twenty-first century. 
The aforementioned failures illustrate that considering web data as merely strings is insufficient as it exposes to vulnerabilities (e.g. code injection) and, more generally, it prevents accurate tracking and verification of data manipulations. Most existing approaches for program verification fail to extend to web applications, for a simple but fundamental reason: web applications process rich data structures in various forms: ordered trees (such as most webpages and XML documents), unordered nested records (such as JSON records), and graphs (such as social networks and sets of RDF triples). These structures emerged as flexible – and de facto – ways of representing data on the web. However, their richness, heterogeneity and coexistence inside a single application constitute important scientific obstacles for the development of analyses. 
My research seeks to address this challenge by building novel theoretical foundations and practical tools for the static analysis of the source codes of web applications. Their purpose is to automatically detect defects, or otherwise prove guarantees such as the absence of errors and undesired behaviours. The overall objective is to open up new horizons for the development of science towards a web of trust, and to obtain concrete tools capable of verifying and optimising real-world applications that process web data. 
Talk Outline 
A first part of this talk will be dedicated to preliminary insights on the tree logic that I have been designing and implementing for the purpose of reasoning with web data structures. The developed logical solver is now used as the essential component in many practical applications and independent research works conducted worldwide. In a second part of this talk, I will review some of these applications, before detailing four of them that I have been investigating. These contributions illustrate different facets of web applications: impact of rapid evolution of schemas, subtyping for rich type algebras, verification of layouts for any generated web page, and containment for expressive queries. I will finally draw perspectives for further research in the area.