Belén Baez -Generating stories from ambient data

Belén Baez


Jury :

  • Francois Portet, maître de conférences, Grenoble INP, Laboratoire d'Informatique de Grenoble, directeur de thèse
  • Patrick Girard, professeur des universités, Université de Poitier, Laboratoire d'Informatique et d'Automatique pour les Systèmes, rapporteur
  • Catherine Garbay, directeur de recherche, CNRS Laboratoire d'Informatique de Grenoble, codirecteur de thèse
  • Sybille Caffiau, maître de conférences, Laboratoire d'Informatique de Grenoble, examinateur
  • Eddie Soulier, maître de conférences, Université de Technologie de Troyes - institut Charles Delaunay, rapporteur
  • Marc Cavazza, professeur, University of Greenwich, department of computing & information systems, examinateur


Stories are a communication tool that allow people to make sense of the world around them. It represents a platform to understand and share their culture, knowledge and identity. Stories carry a series of real or imaginary events, causing a feeling, a reaction or even trigger an action. For this reason, it has become a subject of interest for different fields beyond Literature (Education, Marketing, Psychology, etc.) that seek to achieve a particular goal through it (Persuade, Reflect, Learn, etc.). 

However, stories remain underdeveloped in Computer Science. There are works that focus on its analysis and automatic production. However, those algorithms and implementations remain constrained to imitate the creative process behind literary texts from textual sources. Thus, there are no approaches that produce automatically stories whose 1) the source consists of raw material that passed in real life and 2) and the content projects a perspective that seeks to convey a particular message. Working with raw data becomes relevant today as it increase exponentially each day through the use of connected devices. 

Given the context of Big Data, we present an approach to automatically generate stories from ambient data. The objective of this work is to bring out the lived experience of a person from the data produced during a human activity. Any areas that use such raw data could benefit from this work, for example, Education or Health. It is an interdisciplinary effort that includes Automatic Language Processing, Narratology, Cognitive Science and Human-Computer Interaction. 

This approach is based on corpora and models and includes the formalization of what we call the activity récit as well as an adapted generation approach. It consists of 4 stages: the formalization of the activity récit, corpus constitution, construction of models of activity and the récit, and the generation of text. Each one has been designed to overcome constraints related to the scientific questions asked in view of the nature of the objective: manipulation of uncertain and incomplete data, valid abstraction according to the activity, construction of models from which it is possible the Transposition of the reality collected though the data to a subjective perspective and rendered in natural language. We used the activity narrative as a case study, as practitioners use connected devices, so they need to share their experience. The results obtained are encouraging and give leads that open up many prospects for research.